Close
I tried refactoring the threading a bit. On a test image (post #27092, full), set to 60/.7/.3/.6/1.1/1/.8/30/2/1/nearest, it runs in about 13 seconds on my Q9300 (vs. 43.4 with one thread). It still occasionally likes to take 20 seconds instead of 13, though.

Memory usage is about 44 bytes per pixel for 8-bit RGBA. A 3000x3000 image will use about 340 megs; a 6000x6000 image will use about 1.3 gigs. About 56 bytes per pixel for 16-bit RGBA.

Photoshop likes to take over most of the process address space, so trying to allocate memory directly will probably cause problems (in 32-bit, anyway).