This comparison is a little more bizarre. Again, you have added aliasing and thinned/warped lineart, but you also have banding along the windowsill amplified in what appears to be an attempt to sharpen it. I think it goes without saying that this is not a positive effect.
I checked on some other images with more detail and the loss is pretty substantial. It’s not like most anime is 100% large swaths of completely flat colors, so these comparisons are about as good as you get. If you like really sharp images when upscaling, just use NGU and call it a day; it looks much better and runs in real-time just fine.
“But wait!” cries the voice in your head. “Isn’t his filter applied on top of Jinc? Why don’t I see Jinc in these comparisons?!”
Great question, voice in your head! Indeed, it is quite strange that the author opted not to include just plain ol’ Jinc here, especially when he did bother to include bilinear, an extremely fast but blurry algorithm, and xBR, a pixel art algorithm. The more cynical among us might say this is an attempt to conflate the results of his filter with those of just Jinc.
He claims in the paper that his comparisons were “randomly selected”, something I highly doubt is true. Randomly selecting frames from videos will almost never produce an interesting selection that shows the tradeoffs present in various algorithms. Additionally, it is very, very tempting to choose images that make your algorithm look better than it would in a 3rd party comparison, even if this isn’t done consciously. A quick glance against some other sample images I had lying around makes me think that occured here.
He chooses to use this graph as evidence of his algorithm being state of the art. The “perceptual quality” was apparently just him doing a double blind test with some random people. Most people are grossly unqualified to judge video quality and probably can’t even tell the difference between the three rightmost images in the above comparisons. Most people are also going to say that sharper images look better, regardless of detail loss and other artifacting.[1] I can produce some incredibly sharp images that look like shit and get a ton of people who think they look great. That said, subjective quality comparisons have their place, but the graph means very little on its own without detailed info on how the data behind it was collected.
The author dismisses more quantitative metrics (PSNR, SSIM, VMAF, etc.) by saying he is only concerned about upscaling to 4K, and that there are no ground truths at that resolution because no anime is produced at it. As mentioned earlier, the decision to restrict comparisons to only 4K is arbitrary and dumb.
There are other random bizarre factual errors present that aren’t terribly important but nonetheless amusing, my favorite being his claim that no anime is mastered at 1920x1080 except for full-length films. I won’t go through the full list, but their presence is not encouraging.
In short, please try to view self-proclaimed video innovations with a degree of healthy skepticism. More often than not, there’s a lot of marketing and a distinct lack of substance.
Footnote:
1: This is admittedly a hard problem. A reasonable comparison is with loudness in audio — people will rate louder samples as sounding better, so if your encoder adds a +0.1 dB boost that can give you can unfair advantage in comparisons. That’s easy to correct for, as you can normalize the max volume, but there’s no similar solution in the video space. Even if you could correct for sharpness, it wouldn’t make all that much sense when comparing sharpening filters. ↩