Close
HDD mirror of site
Just trying to judge interest.

If I make available a full image dump and some sort of method to view the site locally would anyone be interested?

Something like price of 2tb hd, shipping, and optional tip money for my time?
And...

  • More donating 'cause HDD price for this year.
  • Which P2P host/method will be use for locally work?
  • Sounds interesting. (again... ( ̄□ ̄;))
I get criticized for doing anything huh

I'll provide a rsync endpoint that'll let you keep your local mirror in sync.
I'd only be seriously interested if you solve some of issues with maintaining and updating such a large collection of images.

At the bare minimum you'd need to offer an automated way to keep tags (or potentially all actions in History) up-to-date, as well as a simple method to sync new images uploaded to yande.re onto this 2TB hdd, all while maintaining tag search functionality and navigation of all the images and pools.

Edit: I see you mention rsync, so you're thinking of this being literally a local copy of the site?

If it was only a static dump which couldn't be updated, I'd only be mildly interested.

Something else which comes to mind as an interesting prospect, would be combining this idea with a service like like Bitcasa. Only $10/month (free in Beta) for unlimited cloud storage and bandwidth on the Amazon AWS Cloud, with the ability to mount and share folders on Windows & MacOS (*nix support planned) and full file-system integration. That could be a thinking-outside-the-box way to cut down on the bandwidth required to keep such a full image dump updated in real-time. Though when you consider the cost for each user to subscribe, it's probably not practical unless someone was already using Bitcasa for other purposes.
I was thinking of making https://yande.re/forum/show/13838 work with the offline copy and some sort of push button functionality to get the images rsynced.

The downside is that rsync is rather io intensive and wont scale at all...

The even bigger long term is getting some sort of decentralized p2p cdn or some sort, there was some brainstorming in the irc channel today about it, but there's no real software available that can be used and adapted.

edit, there's something called hentaiathome that could be used for distribution however only the client is provided with no server code to reference to
If only you offered to do this back in 2011 before the floods... I'm interested, but somewhat hesitant, since doing it now would really cut into money which would otherwise go to you.

It may be worth it to take your time sorting out the updating side of things, until you are able to pick-up a bunch of reliable low-power HDDs, like the 2TB Hitachi 5K3000, on sale for like $70 each at Fry's and/or Amazon.com, like they used to do on occasion pre-flood. Today with 2TB HDDs still asking a post-flood premium of ~$140, we've yet to reach a point where buying storage on a whim is practical again.

At this point I'd almost be more inclined to just ship you a HDD drive I already own... How big did you say the site was currently?
Hmmm...

May be adopting the H@home coding firstly should kill some time long enough before we can get a lower price HDD or longer than that if we weren't lucky enough.

BTW, soon or later this work should be developed and everyone will be impacted.

(For Cloud Sage..........)
admin2 said:
I was thinking of making https://yande.re/forum/show/13838 work with the offline copy and some sort of push button functionality to get the images rsynced.

The downside is that rsync is rather io intensive and wont scale at all...

The even bigger long term is getting some sort of decentralized p2p cdn or some sort, there was some brainstorming in the irc channel today about it, but there's no real software available that can be used and adapted.

edit, there's something called hentaiathome that could be used for distribution however only the client is provided with no server code to reference to
Someone tried to reverse engineer it back in the days. Will try to poke him about it to see if he managed it.

It's really what you are looking for but without the server is quite useless.
Shuugo said:
...but without the server is quite useless.
The server (or just... one server) is the important sector for updating work. (at least, in the first step for clouding work)

Still not counted in b/w or another physical data transferring work...
admin2 said:
Something like price of 2tb hd, shipping, and optional tip money for my time?
Are you mad? Imagine what customs will do when it's intercepted.
Radioactive said:
Are you mad? Imagine what customs will do when it's intercepted.
Option A: Require a hard drives supporting Bulk Data Encryption for any international shipments? That goes back to how big the site is currently. Before the floods, Hitachi's 750GB 2.5" drive supporting BDE used to be somewhat reasonably priced. All the 3.5" higher capacity drives supporting BDE from various companies were always nearline/enterprise storage stuff and very pricy.

Option B: California residents only = no customs. Sneakernet works better on a smaller scale anyway.
Sneakernet would work better, and would help rsync as well since we're colo'd there too.

edit, truecrypt the whole drive, I pass the password though the site or something.
If we did ever do a sneakernet thing here in California, we could possibly look into seeing if Misha would be interested in doing something similar with his private multi-TB animu & mango archive co-located up here in Northern Cal. Normally I'd say he wouldn't be interested, but with lolipower.org now officially dead, who knows. Maybe he'd be willing to do something on a small scale at some point in the future, for the sake of data preservation.
Oh, didn't realize that loilpower.org decided to go offline.

I have 3 backups currently, 2 of them local in socal. So recovery would be "fast" for the most part, but it doesn't hurt to have a few more I suppose.
I was actually thinking of the opposite. If anybody had interest in passing around animu/mango archives at the same time these yande.re mirrors went out.
Ah, so...

  • The backup sites (colo) upgrading would be done in the near future since we have a (nearly forgotten) treasure.
  • (edited) Adding more sites other than the lolipower for reliability should be recommended. (and more works)
  • The another problem left (but less hard to deal with) is adopting Sneakernet for working with the colo.
Nice...
Yes I am interested, provided auto sync can be configured/deployed.

In addition, local mirrors provides redundancy should things happen, e.g. RAID rebuild or complete loss. We can have user to choose if the local copy is only a data sink (only receiving data for local view) or a data reservoir (data may be requested to be uploaded to the master copy for rebuild), or even DR masters (data may be designated as master copy when the original master copy is been rebuilt). This is somewhat a HA solution.

Based on this idea, ideally we can have 2 colos, one is master while the other is DR master. Contributing users can deploy data reservoir since setting up web server for HA is not very practical for home user but having backup data readily available is useful. Then the normal browsing user can deploy data sink that allows them to browse locally.

So far it seems P2P sync is the best way, especially when fetching data from data reservoirs at home locations to rebuilt master at colo. If we are talking about transferring only the differentials then bandwidth is not really a concern.

I highly suspect if there's any existing software solution for this but this is just my $0.02. The original topic is not on data redundancy so it's kinda off-topic:p
p2p sync would be nice, without using torrents.

Anyone a programmer who's interested in such a project? :D
I'm interested in being able to search and browse thumbnails and samples locally, this way I can search without waiting for pages to load or mess up with browser tabs when going through large amount of pictures looking for the ones I need. It's essentially your mirror system just stripped of full images. If you got time and this is feasible plz consider :)
Well I can always implement something of that sort, if I ever get some time over, new job + 4 hours of travel time cut me pretty damn short in regard of time over.

Second it would mean I would code it in C#, meaning no *nix suppot, might get it to work under Mono but I have absolutly no experience with that enviroment.

If I had more experience in java I could have coded it in that but I don't and I would rater watch paint dry then code it in C/C++.

At least have some ideas and experience on how to set it up in C#, with a private set of P2P nodes. But yeah dunno if it's even wanted if it's pretty much windows only ( probobly works fine under Wine but still )
C# is actually ok, considering you are syncing only the files and DBs. The local webhost can be setup in win/*nix as file system and MySQL is cross-platform.

If anyone's serious about this, I see no problem having a small win VM inside *nix for syncing files to the file system and DB to the MySQL instance.
I don't really mind the language but it needs to run on the server as a "master" node of some sort to keep track of new images as tag updates.

Of course there's a chance of failure in that implementation.

Will have to think of a method that can be decentralized while still providing "trusted" updates of images and tags.

edit: maybe some sort of signed private/public keypair.
I remember uTorrent is cross-platform and also has API/SDK.

Even better, Vuze is based on Java thus is cross-platform too. Vuze also has Plug-in system which offers wider functionalities (complete Java API) than uTorrent's (HTML/javascript).

The incremental update .torrent file can be packed into .zip file which can then be digitally signed using Java facilities (by treating .zip as .jar).
admin2 said:
bump bump

http://gun.io/blog/webp2p-new-peer-to-peer-technology-on-the-web/

seems interesting and worth following
In my opinion, the most difficult part of mirroring a danbooru-alike site is keeping the metadata.

For files, rsync or git would work. Personally I prefer git, if considered there are many people who have tried or are trying to make a site rip. With git it's possible to eliminate the duplicated downloading with some tricks. Rsync can't provide that function as rsync protocol doesn't track content of files, while git does.

However, the tracking and using of all the metadata may be impossible. File names only contain the tags but no other metadata such ranking, safety, source, and even dimensions (although the dimensions can be rebuilt from the file).

Even a stripped database is exported daily or mounthly, users interested in it have to build their own LAMP and danbooru to view it, and tracking the dump itself is a problem. Although not changed too much (compared with the whole size), the dump need to be downloaded again for each release. A dump of database in some format that git can recognise may be help, but I'm not sure if there is a such format.

admin2 said:
p2p sync would be nice, without using torrents.

Anyone a programmer who's interested in such a project? :D
In fact I'm interested in it and have being worked on a more generic project since 2 years ago, but the project still doesn't hitting any milestone (ashamed for my low efficiency :( ).

I can't make any promise because I need to focus on my paper at first ...
Emmmm...is the DB MySQL?
A dump of database in some format that git can recognise may be help, but I'm not sure if there is a such format.
There are JSON and XML. You may get all posts-tags data relatively easy, and for updates there is change:<100 search.
i've wrote some scripts for making kona torrent and have "dump" in JSON format to play with.

imouto.json.gz from 29-Jan-2012, 21M
format sample

users interested in it have to build their own LAMP and danbooru to view it
you don't need full moebooru to search/view posts. simple java/.net app with built-in web server should be enough. also, we may try some tricks with web apps and file system api

personally, i use nginx as caching proxy for mirror and simple opera user js to patch images' location to it. this way new posts are stored on first view.
partial nginx conf