Close
The forked Windows version found at https://github.com/tanakamura/waifu2x-converter-cpp seems to function properly + RGB model support (links to pre-built binaries on the bottom of page), and it also supports CUDA/OpenCL making it very fast.

If for whatever reason you don't have a GPU supporting CUDA/OpenCL or have set the --disable-gpu parameter, make sure you also set the --block_size parameter to something other than 0, or performance will be horrible.

On my i5-3570K CPU, the smallest --block_size 128 seems fastest.

On my GTX770 GPU, either the default --block_size 0 or --block_size 1024 are fastest.

You may want to update the models_rgb folder with the json files found at: https://github.com/nagadomi/waifu2x/tree/v1.0/models/anime_style_art_rgb The latest version from 2 days ago seems to offer slightly better detail retention. These also appear to be the ones which are currently in-use on the http://waifu2x.udp.jp/ test server.

An alternative fork supporting RGB + CUDA can be found at:
https://github.com/lltcggie/waifu2x-caffe/releases
Requires cudnn64_65.dll from the NVIDIA Registered Developer Program to use CUDA on that one though (crashes for me without it). The license for cuDNN v2 appears to allow redistribution, so I've rehosted it here: https://www.mediafire.com/?2uhcwb9i1lzd18c

The GUI with that one is Japanese only, but you can reference this for a rough translation: http://i.imgbox.com/I1Zsg82c.png The caffe version is also slightly different in that it seems fastest with a block size of 256 on my GTX770.

Compared to the tanakamura version which uses bicubic, the caffe version seems to use a lower quality method for 4:2:0 YUV subsampled JPEG -> RGB conversion (bilinear?). To get near-identical results between tanakamura/caffe/web, you'll need to save your images as RGB before filtering.