Against new naming scheme

petopeto

2008-06-20 18:02:28 UTC

It did that, but it was too slow to do 16 times per index screen (times 3 for atom + piclens) to look up tag types. I could probably speed it up but I didn't want to spend time on that yet.

MDGeist

2008-06-20 18:05:17 UTC

petopeto said:
Yes, the dupe check does take a little intelligence to use.

you didnt understand the post and no only syao is trolling.

You would be the third person here to tell me to stfu...

Radioactive

2008-06-20 20:28:36 UTC

MDGeist said:
you didnt understand the post and no only syao is trolling.

You would be the third person here to tell me to stfu...

I think you set a fire under his ass.

Radioactive

2008-06-20 20:30:31 UTC

syaoran-kun said:
wth..just use an md5 hasher if you want hash filenames instead of bitching about the new change (that imho is better, cuz md5 filenames don't tell you a shit about the image itself)..what's the problem..too HARD for you?

No need to get angry about it. Some people have concerns about the new naming schema.

syaoran-kun

2008-06-20 20:47:30 UTC

Fyi i'm not trolling, i'm just trying to reply instead of facepalming to death.

My only issue is that all my nearly 10,000 images I have downloaded from here, no longer have have the same file names as the site. Because of this I am going to end up with a ton of duplicates when I accidentally re-download something which I didn't know I already had. Could someone at please make a renaming batch file to rename all the files with the md5 file names to their new file names?

While I wouldn't have had much of an issue if the file names were like this from the start, as it is now this is going to make for a huge waste of time and bandwidth. People are going to either have to rename everything, re-download everything, and/or waste space with thousands of identical images with the different names.

aoie_emesai

2008-06-20 23:28:44 UTC

You gotta be selective, Cyerbeing. ^^

Cyberbeing

2008-06-21 00:36:36 UTC

Well about a good 6000+ of them are from the moe.imouto.org siterip from awhile back but for the rest I was selective and only downloaded what I like. There are about 4,188 images so far that I manually selected and downloaded by hand. Lets just say I have have a lot of free time and I enjoy archiving images.

For that reason (because I am selective and don't just download everything in order) this becomes an issue because I now have no way of knowing what I have already downloaded and what I haven't.

aoie_emesai

2008-06-21 01:00:46 UTC

Cyberbeing said:
Well about a good 6000+ of them are from the moe.imouto.org siterip from awhile back but for the rest I was selective and only downloaded what I like. There are about 4,188 images so far that I manually selected and downloaded by hand. Lets just say I have have a lot of free time and I enjoy archiving images.

For that reason (because I am selective and don't just download everything in order) this becomes an issue because I now have no way of knowing what I have already downloaded and what I haven't.

Was trying to be humorous, then I get smack down. Thanks. I sure understand what these image mongers are like now. Then you should have saved the site rip and the selective image into the same folder. Just backtrack when the site rip was ripped and then you have a dateline.

W/e I have no reason to argue with some in incased with anger like you.

*A ha, now I finally understand that idioms " People are are afriad to change" when it's meant to ease them.

ps: Start fresh, have fun now :D

Cyberbeing

2008-06-21 01:41:49 UTC

Knowing that the site rip was done in October 2007 doesn't really help me because I randomly download things sorted by tags as well as just page by page...

I'm not angry that the change was made because this is more of a personal problem then anything else. I'm more just venting frustration because it's unlikely it will go back to the old way (considering the new way is much better) and because I didn't see this coming I'm now in a real mess going forward. I'll be able to work around the problem it's just going to create many hours/days/weeks of unneeded work for me that I wasn't expecting nor am I looking forward to. Not very fun XD

If someone will either make a renaming batch file or torrent an updated site rip at some point I would be very thankful.

Biohazix

2008-06-22 03:12:43 UTC

Cyberbeing said:
If someone will either make a renaming batch file or torrent an updated site rip at some point I would be very thankful.

A Python script is fine too?
http://orphelin.no-ip.org/md5totags.py

When you run the script it asks for a folder. It searches that folder for images with md5 names, does an md5 search on imouto for those images, finds the new filenames, puts the old and new names in rename.txt and opens it for you so you can check if everything's fine and fix things that aren't before renaming.
It should only mess up if there are Japanese characters in a tag. They look fine in rename.txt but not in the actual filename after renaming (戦女神ＺＥＲＯ becomes æˆ¦å¥³ç¥žï¼ºï¼¥ï¼²ï¼¯), you'll have to manually fix that.

Let me know if there are any problems.

admin2

2008-06-22 03:30:20 UTC

I should dump the tags+post ids so you can look up locally instead of having to constantly hit the db.

petopeto

2008-06-22 03:37:06 UTC

Please don't make a separate request for every file; it'll load the server unnecessarily. Batch them into a single request: "md5:a,b,c,d". I don't think there's a hardcoded limit to the number of MD5s (but you may need to specify a higher limit if it's more than 16).

You'd be better off using the XML or JSON API than parsing HTML.

admin2

2008-06-22 03:42:40 UTC

Instead, use this dump formatted in csv.

http://imouto.org/dump.rar

Cyberbeing

2008-06-22 05:58:47 UTC

Thank you Biohazix, I tested your script on a few images and it works well. This will save me a lot of time.

As it seems the python script isn't very kind to the server, I'll hold off on using it for awhile. Is anybody able to modify a kinder script as petopeto suggested or make a new script to use the csv dump?

That got me thinking. If someone is willing to make a script for it, admin2, could you maybe once a month generate a dump with all the images that had their tags changed which contains both the old tags and updated tags in it (if that is even possible)? An automated script to do renaming for tag changes could be very useful and would negate another problem the new system causes.

Biohazix

2008-06-22 11:53:56 UTC

Cyberbeing said:
Is anybody able to modify a kinder script as petopeto suggested or make a new script to use the csv dump?

I'll modify it to use the csv. Anything else I should change? Maybe include subdirectories, I don't know if that's useful to you.

Edit: Done. http://orphelin.no-ip.org/md5totags.py
Put it in the same folder as dump.csv.
Thanks for that dump admin2. It's very fast now, takes about 9 seconds to rename 350 images for me.

Cyberbeing said:
That got me thinking. If someone is willing to make a script for it, admin2, could you maybe once a month generate a dump with all the images that had their tags changed which contains both the old tags and updated tags in it (if that is even possible)? An automated script to do renaming for tag changes could be very useful and would negate another problem the new system causes.

A monthly dump would be nice. I could modify the script to update the filenames. (No need for old tags or only updated tags, a full dump like this one will do fine.)

admin2

2008-06-22 16:22:18 UTC

I can setup a monthly export of the tags, its not too hard to have it export as csv.

MDGeist

2008-06-22 19:17:56 UTC

that would be cool

Cyberbeing

2008-06-22 21:59:40 UTC

admin2, would it be possible to export the csv as UTF-8 so the japanese and other unicode characters in the tags are preserved?

Also while running the new python script I ran into a couple problems but nothing I couldn't workaround.

The first is when the script runs into an error, it silently fails and closes. If it could be modified to ignore errors and continue renaming and then afterwards create a log file of the errors, that would be a preferable behavior.

Things that make it throw an error and close are:
Invaild characters in the tags \ / " | < > ? *
Tags which contain over the maximum supported characters for a file name.

Biohazix

2008-06-22 22:21:56 UTC

Cyberbeing said:
Things that make it throw an error and close are:
Invaild characters in the tags \ / " | < > ? *
Tags which contain over the maximum supported characters for a file name.

Looks like I should've tested more. Good idea, I'll make it log errors. Any other suggestions?

Cyberbeing

2008-06-23 00:11:36 UTC

Well I'm not sure how the site does it, but if you could just mimic its behavior for replacing invalid characters and shortening overly long tags in file names automatically that would probably be the best solution.

I already finished renaming everything with that csv based python script you posted so I don't even need it anymore (I just had to run it multiple times for every error it ran into that I forgot to fix) but it would be helpful for anybody else that needs to run it.

I also noticed another bug which I'm not sure of the cause.

For example the following which is outputted to the rename.txt:

02bcd4a0fd0ce3c30888a0267e2e838e.jpg | moe 4979 fate/stay_night koyama_hirokazu matou_sakura see_through stick_poster.jpg

The / is invalid so I changed it to the followed and saved the rename.txt:
02bcd4a0fd0ce3c30888a0267e2e838e.jpg | moe 4979 fate_stay_night koyama_hirokazu matou_sakura see_through stick_poster.jpg

If I then press enter, it crashes (I'm not sure if it even throws an error) even though everything should be fine. Now if I make the change directly in the csv and then run the python script it works correctly. Any idea why that happens?

A separate python script to just run off the rename.txt without parsing the csv might also be nice.

Biohazix

2008-06-23 00:31:55 UTC

Cyberbeing said:
Well I'm not sure how the site does it, but if you could just mimic its behavior for replacing invalid characters and shortening overly long tags in file names automatically that would probably be the best solution.

The only reserved characters that show up in tags are ", / and :. The script will automatically replace those just like imouto does. (Why is a : replaced with a space instead of an underscore? Makes it look like separate tags.)
If there's still a reserved character, or if the name is too long, it will say so in log.txt.
You'll have to manually shorten filenames if needed, I don't know how imouto does it.
Editing rename.txt doesn't do anything anymore because that shouldn't be necessary anymore, and to prevent errors (like that bug, no idea why that happened).

Cerb69

2008-09-09 16:40:24 UTC

Sorry to bump to topic but does anyone still have that rename script?

The original link is dead QQ

MDGeist

2008-09-09 22:15:35 UTC

dump.csv could need an update, too...

Biohazix

2008-09-10 18:10:42 UTC

Cerb69 said:
Sorry to bump to topic but does anyone still have that rename script?

The original link is dead QQ

http://pastebin.com/f43413970

If there would be regular updates of dump.csv I could modify it to update files with new tags, if anyone wants that.

Cyberbeing

2008-09-10 19:03:00 UTC

I would still like that, but it seems admin2 has forgotten to setup the monthly csv dump of tags like he said he was going to do.

admin2, if you're still able to do monthly dumps of tags, I would be grateful.

Name
Email
Password
Confirm Password