AVIF 444 @ 20,120 bytes
Unfortunately, this actually sets off a red flag for me. At that file size, the JPEG looks far worse than it should. And sure enough, nowhere in the post does Netflix actually explain how they produced the JPEGs, so I have no idea what encoder or settings they used here. Checking the Docker container reveals the answer: the JPEG-XT reference encoder, using default settings. This produces signifcantly worse results than even libJPEG, let alone MozJPEG.
If you take the same original image and run it through MozJPEG, you get far better results[1]:
MozJPEG 444 @ 19,858 bytes
The sad part is, AVIF still looks better in most cases, and at higher resolutions the difference is even more pronounced. So why cheat? Why do this? I don’t get it, I really don’t. All the comparisons in their post use the worse encoder, and every single image looks way better as a JPEG with a reasonable choice of encoder. They offer below only a few different metrics and some data (with some of the excluded metrics raising an eyebrow), but all of this is basically junk because the JPEGs are far worse than they should be. Very sad.
Next up we have, published by Netflix on August 28th, 2020, Optimized shot-based encodes for 4K: Now streaming!. This article goes over their process of backporting work the company has done on per-title and per-shot encoding optimizations to their ‘premium bitstreams’ for 4K content and how it compares under VMAF. I have various complaints with their optimization work and VMAF in general, but the details of that are largely irrelevant here as their sin is conflating what is good for customers versus good for the company.
If you’re a company serving digital video, you are highly incentivized to cut down on the size of your video streams. The smaller the video, the less storage is required and the less bandwidth used. At Netflix’s scale, being able to lower the bitrate of video translates to a huge cost savings, so it’s understandable that they want to do this. Unfortunately, this is not always a win for the customer.
If you’re streaming on a poor connection, it may very well be a win: better quality at lower bitrate means an improvement if your limitation is your connection. For a lot of their customers, however, the limitation is not the network but the hardware. Netflix chooses not to give customers any control over what stream they receive, instead selecting it dynamically based on what they think the network and hardware can handle. The hardware is the key part here: if your screen is 720p, that is the highest resolution you will get. Similarly, if your desktop/phone/tv doesn’t have the magical hardware and browser combination required you won’t be going above 1080p regardless of your screen size or network connection. This means that for a lot of customers, their resolution, and thus their quality, will be locked to 1080p.
Now, why does this matter? In that article, all of Netflix’s charts use the bitrate as the x-axis and their comparisons are done at a roughly similar bitrate, rather than resolution. However, if you’re stuck at 1080p, this isn’t a fair comparison! You can’t get the 4K stream, so a more useful comparison is looking at a 1080p stream on the old encoding pipeline versus the newer one. This is a problem because, well, it looks worse when you do this. This is a downgrade for a lot of their customers. Even if you don’t trust my eyes and prefer their ‘objective’ metrics, look at their chart here. You are absolutely taking a quality hit, according to VMAF, if you’re limited to 1080p streams. This is not a customer win. It’s a win for Netflix, because the file size is smaller, but that’s it. And maybe that’s fine, but at least frame it that way! Or perform an apples-to-apples comparison, and pit your old pipeline against the new one at the same resolution.
Okay, on to Crunchyroll. On March 16, 2017, they put out the post Improving Video Quality for Crunchyroll and VRV in response to customer backlash. They were clearly in the wrong, and this post was sleazy because it pretended this was a mistake rather than a deliberate attempt to lower file sizes (which they later admitted to), but that is not the core problem here. When you compare frames across different encodes, and you choose to crop the images, please use the same frame and cropping each time. You would think this is a no-brainer, but alas…
Multiple people complained about this at the time, so I figured the message was received. Their blog mostly died after that, so I never saw another example either way.
Until this year!
Specifically on March 28, 2020, when they published Scaling up Anime with Machine Learning and Smart Real Time Algorithms. The post discusses how they chose to try and improve the image quality available to customers with both server-side and client-side upscaling improvements, with Waifu2X on the server and Anime4K on the client. Once again, I have various quibbles with their technical decisions, particularly on the Anime4K front, but the larger problem is that once again they somehow fail to actually crop comparison images correctly.
Come on, guys. Why do this?
There are a billion ways to invalidate image comparisons, but this is definitely among the dumbest. Commenters on even the orange site manage to get this right most of the time, so there is truly no excuse.
As a final note, since this post is basically dumping out some random complaints previously relegated to my Twitter account, I want to give a special “shoutout” to Riot’s blog. Every time I’m linked an article on there it seems to have some insufferable tone that makes me immediately want to close the tab. Some quick examples from searching Discord logs: 1 2. Please don’t do this. The anti-cheat one is particularly bad, since it essentially snarks at people concerned about them installing a fucking kernel driver. Whose complaints, by the way, turned out to be totally valid when it started causing performance issues on unrelated games. If you’re going to push an unpopular technology on people, the least you can do is avoid talking down to them in the process. Then again, this is the company that makes you write essays on how much you love League if you want to work there, so I’m not sure what I was really expecting.
Footnote: