Story Time
Presented by: John Graham-Cumming, Kornel Lesiński
Originally aired on September 16, 2020 @ 6:30 AM - 7:00 AM EDT
Join Cloudflare CTO John Graham-Cumming as he interviews Cloudflare engineers and they discuss a "war story" of a problem that needed to be solved — and how they did it.
This week's guest: Kornel Lesiński
English
Interviews
Transcript (Beta)
Okay, welcome to Storytime. I am John Graham-Cumming. I am Cloudflare's CTO and this week we are going to talk about image formats and in particular image formats on the web and I have with me Kornel Lesiński who works on all things images at Cloudflare.
Welcome Kornel. Hi, it's really great to have you on the program and I thought a good way to start would be to go back in history a little bit and talk about some of the older formats that have been on the Internet for a long time and then we'll bring it right up to date by talking about the latest stuff which is AVIF, I believe, right?
Yeah, so the oldest and still the most popular format that we have on the web is JPEG and Xip Org researcher Kim Terryberry calls it an alien technology from the future because back then JPEG got so many things just right, the balance of effectiveness and simplicity that it still just works pretty well for us and newer formats failed to substantially improve over JPEG.
We've had JPEG 2000, we've had JPEG XR, we've had WebP and they're all 15%, 20% better and sometimes brought their own problems and inefficiencies that JPEG doesn't have so we've still remained with mostly JPEG to this day.
And when you talk about efficiency, one thing is that, you know, when we've been dealing with images at Cloudflare there's a few different things, right?
There's the compression, so how small can you get the file, there's also how fast can you compress it and how fast you can decompress it and so, I mean, I think it's the case that JPEG has got a good at all of those things basically, right?
Yeah, JPEG was designed when computers had 25 megahertz CPUs so back then it was slow but, you know, machines got so much better than it's super cheap and efficient these days.
But JPEG is also very good in terms of compression for file size to quality ratio that you get.
So that's how for so-called lossy codecs, codecs that formats that can choose the quality they can they produce, the benchmark for how good they are is the ratio between the visual quality you get and the file size.
So you can always tell the codec to make a smaller file but that will cost you in quality and the less quality you lose when making the file smaller the better JPEG is.
So when I say other format is 20% better, I mean for the same visual quality for about same looking image you can get 20% smaller file or if you keep the file size exactly the same the other will look 20% better.
And when you say something looks good, how do you judge when something looks the same with different quality ratios?
Oh, that's a very difficult problem because it's very subjective.
Some people like images to be smoother, some people are sensitive to sharp edges or pay attention to texture of some some feature in the image.
So it's the data from people is actually very noisy.
So researchers who need to measure this precisely use just computer approximations of how human vision works just to get a specific precise number to be able to say well this is 5% more accurate according to a certain metric.
Right, interesting.
So JPEG is hugely popular but then there are other formats, right?
So GIF or GIF depending on how you say it, that became popular and that was held up by patents, right?
Yes, so it's one of the older formats. In terms of compression is very inefficient.
You get huge file sizes and it's not looking very well at any file size.
But it was also one of the first and one of the oldest on the web.
And it's still popular because it works everywhere. There's a whole pile of formats that tries to replace GIF and they support animations in many ways with better quality, with more features.
But we still use GIF because every bit of software, every website, every service supports GIF.
So it's not just the quality of compression you get, it's also the interoperability you get.
Right, and you pointed to something very important which is GIF format supports animation which JPEG doesn't.
And so all those animated memes and things, those are all GIFs. I like to separate this into something that has become a medium, a way of communicate as a short silent video clip which is separate from the GIF technical way of storing pixels in a file.
Right. So because GIF is so incredibly inefficient and just wastes bandwidth and would take too long to download for most of the social media networks, for example, they convert GIF to h.264 mp4 video like a real proper video codec which gives you 10 times smaller file for the same GIF clip.
So the GIF, the idea of a medium has diverged from GIF, the animation format.
Interesting and why is the video format more efficient than the GIF format for animation?
In video formats and in image formats the idea is to represent the input image using some clever approximations.
So the biggest advantage video formats have over the old animation formats is they can encode difference between frames as this object exists in the previous and next frame but it has moved, it's called motion vector.
So instead of storing entire frame all over again from scratch it can just encode a difference which is for most of videos almost none between frames.
Right. That means you almost need no information to send over the wire.
Right and this is why we get really high quality video on a television over the Internet, you know, 4k video because we're actually just sending the differences between frames.
Exactly and newer formats starting with JPEG and everything after it use more tricks which are based on basically optical illusions.
So for example our eyes are more sensitive to differences in brightness than they are sensitive to difference in color hue.
So newer formats store brightness as a separate channel at full resolution at higher quality and color information at lower resolution with lower fidelity which needs less data but we don't notice this very often.
That's just because how our eyes work.
Interesting, it's absolutely fascinating and when video goes wrong we sometimes see those like random blocks of color right and it's like most of the image there but there's a bit that's gone wrong so you can sort of peek inside the algorithm there and see oh that was the bit that it was unable to or it moved or changed in some way.
Yes if you compress too much the illusion breaks and you start noticing the tricks they use but when you don't notice they are still there and you still get compression that removes 99% of the information in terms of bits from the image and to us it still looks the same.
Okay fantastic so we've done a little bit JPEG we've done a little bit GIF there's also PNG.
Why was PNG created? So JPEG uses all those tricks like making color less precise and when those tricks break in specific situations on very sharp edges on very saturated colors that doesn't look good and we have certain types of images not just photographs which work fine with JPEG but we have computer-generated images we have icons, charts, screenshots of our computer screens, text that have these sharp edges that have saturated colors that JPEG doesn't deal with well with so that's why we've got the PNG format or even previously GIF format that does not use as many tricks it just stores pixels exactly as it was told and therefore preserves all the sharpness all the fidelity of the image.
Of course not using compression tricks means it uses more data you get bigger files that's why we limit use of PNG and GIF to icons tiny emojis and not huge photographs.
Right and if I remember well the PNG format actually does some interesting compression where it looks at line by line because it's sort of assuming the world might be rectilinear and there might be things you could do line by line whereas in a photograph you don't get that.
Yes that's just a very simple technique it's an old format from you know 90s so looking at just one line of an image seemed like a good enough efficient enough way to encode some difference in the image rather than every pixel from scratch but it's not as good as in JPEG encoding bigger blocks of pixels.
JPEG also does not actually store pixels it transforms the image into frequency domain and removes certain frequencies from the image that's also dictated by how our eyes work.
We're more sensitive to certain frequencies certain sharpness in the image and less sensitive to others so JPEG can control exactly those features which can which can be removed which can be left in using clever math basically rather than having to deal with pixel by pixel very local differences.
If you zoom in on a JPEG you sometimes see that right especially on the edge of something you can kind of see how it was broken up into little blocks.
Yeah because these tricks are specifically tuned for your viewing distance so if you're zooming in you're changing pixel size you're changing the relationship between frequencies in the image and how you view them and that breaks the trick.
It could be encoded for being zoomed in. In fact most JPEG codec which is Mozilla's improved implementation of JPEG it's entirely backwards compatible but a bit modernized in how it stores data in the file how it decides what to keep and what to throw away.
Most JPEG was tuned for more modern computer screens. The old JPEG was designed back when we had a CRT screen 72 dpi resolution and 500 pixel images were considered large.
Nowadays 500 pixels is just tiny thumbnail and we have images thousands of pixels large on larger screens with sharper pixels so just tuning the parameters of compression for JPEG created five to ten percent improvement in compression without having to change fundamentally change anything in JPEG.
It's interesting right that that comment you made the beginning about it being a format from the future that it's managed to the format has not had to change is we are able to tweak the parameters enough that we can change with our changing computer capabilities.
Yeah we've extended life of JPEG a bit. It also helps that it's simple.
JPEG is divided into eight by eight pixel blocks and each block is basically encoded as its own image.
It's separate from all its neighboring blocks and that makes it technically simpler to work with it to optimize it and JPEG has a feature that hasn't been used in later formats which is progressive rendering which means when an image loads it can start loading from very blurry or blocky approximation of image without any details right and then it sends more and more details until you get sharp image and back to the original image that you wanted.
Sometimes people will see that if you're on a slow Internet connection sometimes you get an image that loads but it's blurry and then it gradually gets more and more crisp as things download.
Yeah and you might not realize but it also works when you don't notice this because the most costly details in JPEG more than half of the file size is spent on the tiniest sharpest details that you notice only when you look very closely on your screen and start inspecting the pixels but when your page loads and you just glance at the screen you're not inspecting it closely enough to notice the difference.
So a progressive JPEG might be half loaded and you will not even know you'll think it's there because by the time you look closer it will finish loading all the details.
Interesting okay all right so JPEGs, GIFs, PNGs we lived with those for a long time and then Google came along and said hey there's a thing called WebP.
What's WebP? So the history of this goes even back it's interesting that we know MPEG video formats for example they're popular they're on DVDs they're in set-top boxes and satellite TV but there's ON2 technologies VPX format that was basically the video revolution on the Internet and it started with Flash Player back when YouTube was new and you had this tiny grainy videos played with Flash on YouTube.
It used a video codec called VP6 that was the sixth generation of that ON2 company's codec and when browsers moved away from Flash to native built-in video they needed a royalty-free video codec that every browser even free and open source browsers can use and the MPEG competitor wasn't it.
MPEG formats are patented, they're commercial even if you write your own open source software that uses an MPEG format it may be illegal to use even by yourself because of the patents because patents cover the mere idea of using an MPEG format rather than what you've written and therefore the web needed free and open format that fits the web and Google bought ON2 technologies, bought their VP codec and released it as back then VP8 generation, eighth generation of the codec free for everyone to use.
That for the web that was a big moment all the video nerds rejoiced because we finally had a free codec where you don't have to pay license fee to any company if you want to publish video on the web and you don't have to use proprietary Flash.
It was supposed to be great but the first release was a bit early and the VP8 codec did not catch on however it became immortal in the WebP format.
When this optimism was still high the thought was hey we're gonna have this video format that everyone supports why not have a image format that uses exactly the same compression so you get two for the price of one if you're gonna support VP8 video you can as well take a single frame of a video and call it an image format and that's what WebP is.
It's a one frame of a VP8 format video however VP8 failed in the market only the next generation VP9 actually caught on and VP9 is now used by YouTube to serve 4k videos for example.
It's been very successful but WebP never got updated so WebP is still stuck with this first generation first release of the VPX format.
Okay so that's great so now Cloudflare does some transformation of images for our customers which you've worked on so can you explain how that works the customers images and are originally a JPEG and then what do we do we can serve it in different formats right?
Yes so we have a image resizing product if you want your simplicity with managing your images you just want to upload your original photos to your CMS or maybe your users upload their photos but then you need to generate thumbnails you don't need to generate appropriately sized photo for your mobile website for your desktop website then you need to actually generate a new file with appropriate number of pixels resized exactly what you need because if you send too many pixels if you just send the original 20 megapixel photo and serve it as a thumbnail it's going to waste so much data yeah we're gonna we'll be sending 20 million pixels and just display 100 of them that's complete waste so we take only exactly the number of pixels you need and serve that and we can compress it using appropriate format for whatever browser is asking for it so it can be a JPEG it can be a WebP it can be JPEG 2000 and it's going to be the next generation image format AVIF.
Yes so let's talk about that AVIF this is the latest thing in this line of image formats why do we need another one?
Right so in this battle between flash player native browsers VP8 format versus MPEG format there was a next generation competitor called h.265 which improved substantially improved video compression however it came with even more complicated and more expensive licensing costs which meant and the licensing was meant for producers of hardware producers of set-top boxes for satellite tv for cable tv but the world has moved on now we have browsers we watch the video on the Internet and for browser vendors who give their code their software for free paying license for every copy of their software is not an option also h.265 had some internal disagreements between patent owners how to license it who's gonna who's gonna get money for this and the licensing split into two different companies that say they own the same format and try to sell you the same thing it got really complex messy and expensive so Netflix, Google, Mozilla, Apple and other tech giants decided enough of this the MPEG formats are patented MPEG formats are not the way forward and collaborated on developing next generation high efficiency modern video format it's been released recently it's called AV1 but technically it's a continuation of the VPX lineup formats VP10 experimental version was used as the starting point and with collaboration with experimental features from Mozilla from Cisco all this was put together and released as the AV1 format there's a push to get this AV1 format it's for example it has hardware decoding shipping in all of the next generation video cards so NVIDIA, AMD and Intel all announced that they have AV1 decoding hardware hmm that's much better than the situation with VP8 10 years ago where we had a lot of promises but nothing actually shipped we're past that hurdle now AV1 is here and we're repeating this idea with well if AV1 is going to be the video format that's going to take over the web why not have an image format that is just a single video frame from AV1 right and AVIF is that video that image format derived from the latest video codec got it and it's a lot smaller than JPEG it's half the size of JPEG and WebP it's four generations like if you counted the VPX releases it would be four generations ahead of WebP got it and it improves many of the fatal flaws that WebP had so for example I've mentioned this trick with storing color at half of the resolution right and yeah it sometimes breaks if you have a logo in your image or you have a banner with some colorful text WebP will just blur it smash it pixelate it and it cannot do better even if you say I want quality 99% you're still going to get this color crushed to half resolution because it was just deemed that for a video format you see a frame for 1 30th of or 1 60th of a second so who cares about details but you know we have still images on the screen so we have enough time to notice the distortions and WebP is just harsh on certain types of images AV1 has color at full resolution and goes beyond that it has so-called white gamut color so it can have even more saturated colors if you have a modern monitor it has high dynamic range HDR so again if you have the latest generation of TVs or monitors you can get darker dark colors with more fidelity and brighter bright colors with more fidelity and instead of storing colors at 8-bit precision just 255 values you can store at 12-bit precision 4 000 different shades of color or brightness so AVIF improves quality and also improves compression it adds many more clever features for example JPEG, WebP and older formats store color channels and brightness as completely separate like two different images but usually in images edges of the brightness of your overall shape in the image are highly correlated with edges of colors these are similar areas of the image right so AV1 does a clever trick when it tries to predict what colors will be based on the brightness channel just makes a educated guess what the color would look like and encodes difference between the guess and actual color in the image so if it guesses correctly then storing color costs nothing and you get sharp and high fidelity color without increasing the file size so um we're going to come up to time so I just want to talk a little bit about so Cloudflare now supports AVIF is that right just within that image resizing product so we support AVIF in image resizing.
AVIF has one problem that AV1 codec is still in early stages so software is a bit immature and it's slow to encode it's much that's the Achilles heel of AV1 currently it is fast to decode because it's a video format so video devices have to be able to you know display 30 frames per second at least right in full hd so it's fast enough to display but to create a new AV1 image that takes a lot of CPU time so we do that for you right if you use our product where we do the encoding we do that and you know we have machines with 96 CPU cores and we all throw them at the images to generate them as fast as we can right for JPEG it's just milliseconds but for AV1 even with all the computing power that we have you might notice some latency increase on the first time you request an image after that it's cached and it's super fast right but this latency problem we'll be working to hide it to compress images asynchronously so the browser doesn't have to wait so we'll be adding AVIF to a polish software that updates your cached images and converts them in the background right so you get the upgraded image whenever it's ready whenever it's ready yeah brilliant okay so what browsers currently can display an AVIF image so the good news is it already works and it has shipped in chrome chrome for desktop and with next release it's going to make it its way to chrome for mobile firefox is also working on support for AVIF currently it's behind a flag there is no word from safari however apple is part of the alliance of open media a group that developed AV1 and AVIF so presumably apple is on board as well they're just quiet about this right and we'll get we'll get AVIF support in all browsers eventually but if you start using it yeah if you're a Cloudflare customer we'll figure this part out for you so we'll know if the browser is capable we'll deliver the right image format to give you the fastest load time yeah and browsers announced their support for AVIF in hdb headers so we're able to auto detect and automatically search the right format fantastic well this has been very very interesting we're gonna we're gonna run out of time to talk about it but um you know so a brief history of these things it's funny you talk about patterns because i'm old enough to remember when it suddenly became legal to make a gif because originally there was a patent on the compression in gif owned by unisys and there was a famous day when everyone could look at gifs but making one was theoretically breaking the patent and so that was uh that was bright so it's interesting how much of this history is tied up with patents as well and it keeps repeating with video formats and image formats derived from video formats yeah yeah very very interesting all right well cornell thank you so much for chatting on storytime about this and thanks for working on this stuff this is one of those areas where i feel completely ignorant because i have some vague idea about how this stuff works but you know you get into the details of image encoding and it's actually it is rocket science pretty much so thanks for demystifying a lot of it for us and uh hopefully we'll you'll come back on the show at some point we can talk about avif is everywhere and what's next yeah all right very good thanks very much for being on the show we'll wrap it up here and you have a good day