Close
I'm currently coding up a new normalized Haar wavelet decomposer/sorter/coefficient generator for use in the image detector, due to the fact that rq's doesn't really work.

E.g., according to the algorithm in the tree right now, http://moe.imouto.org/post/show/628 and http://moe.imouto.org/post/show/3443 are identical. This needs some work.

The algorithm that was being tested was the multiresolution image query as described in Jacobs et. al. (http://grail.cs.washington.edu/projects/query/). The fuzziness of the search didn't even come into play here---those two above posts had the -exact- same wavelet coefficients. All 120 of them.

rq wasn't kidding when he said the storage requirements were considerable. The coefficient table with 8600 posts will contain over a million rows (120 rows/image).

My query code also needs a lot of work: I'm looking at about 15 seconds to match one image out of eight thousand. Jacobs et al did it in half a second on a database of 20,000 images. In 1995.

That being said, no, I really don't want to use an imported algorithm not least because:

a) if I don't understand it, I won't be able to fix it
b) I'm a kernel-side C programmer, so I should be able to make it fast and small enough to be usable on a near-interactive basis.

viiv, I also took a look at imgseek. Their web-based demo turned up some pretty crappy results (e.g., a similar image search for snowflakes turns up... grass and other random things, but certainly not snowflakes). Plus it's GPL'd, while danbooru is intentionally BSD-licensed. I don't want to get into any tainted code issues.

Things to look forward to (assuming I have enough time to finish):

  • The ability to see possible dupes as you upload
  • The ability to search for portions of images:
--> You can search for a certain artist's eyes
--> Or, perhaps, panty styles etc.

If anybody's got a handy reference for wavelet theory that isn't filled with math junk way over the top of my head (I really don't need a comparison to Fourier transforms, and a coverage of the continuous wavelet transform is immaterial) I'd really, really appreciate it.

So far I've been working off of this article:
http://www.spelman.edu/~colm/wav.pdf

Normalized Haar wavelets are the easiest due to the averaging-and-differencing method you can use to implement them.

If anybody's got tips/experience in this field, I'd love to hear it!