Why Is My Kid Getting HD On Their Phone, While I'm Getting a Blocky Mess On Our 60" TV?
Presented by: Marwan Fayed
Originally aired on April 18, 2021 @ 10:00 PM - 11:00 PM EDT
This question is a common complaint among ISPs and streaming services. This segment will explain the origins of 'adaptive' streaming, some of the challenges, as well as the reasons that no single service can solve all the problems on its own.
English
Transcript (Beta)
Hello, morning, afternoon, evening, wherever you are. Welcome to my session about streaming video.
My name is Marwan Fayed. I'm one of the leads on Cloudflare Research.
One of the things you're going to notice about this talk on streaming video is that the title that you see here doesn't quite match the title that was given on the website, and I promise we're going to get there.
This is a bit of a story, primarily about streaming video and lots of things that escape the broader set of knowledge.
I call these feedback because this actually matters, fairness, which people love to debate, and maybe some frustration because of the title that's on the website.
I'm going to preface this first of all and say this is largely pre-Cloudflare work for me.
I joined Cloudflare in January. I was previously a professor in my former life, did quite a bit of teaching and research and all the normal things you would expect an academic to do, so there's some names at the bottom here.
A lot of this work was built up in collaboration with these people and their current locations, and so I'm ever grateful to them.
Collaboration is ridiculously rewarding.
I encourage everyone to pursue collaborations wherever they can on everything they can.
So let's proceed. On a personal level, this whole talk is kind of a story to me, and it's this notion of why streaming video suddenly became cool to me.
So for years as a network person, my interests tend to be in the network level.
I really had no affection for streaming video at all. I thought streaming video is an application, we design the network properly, it gets bits from point A to point B, the application doesn't matter.
Except at one point, a student came to me and posed streaming video as sort of a network resource problem, and it just took on a whole new life.
Why is this important? One of the things that Cloudflare values is curiosity.
So this is one example on a personal level about how just being open to different things and being a little bit curious can take one in directions that they never anticipated.
So let's start with the end, something I often like to do, and give you the takeaways.
So high level, there's a few big takeaways here.
The first is feedback control loops. We'll talk about what these are if you're unfamiliar.
When they compete against each other, they cause problems, and it's a really bad idea to put them into practice.
Given the problem, I'm going to show you a couple of naive-like solutions, maybe naive is discrediting them, first-pass solutions, and try and convince you that any solution that exists in future has to break the isolation between what is traditionally viewed as the application and everything that's below user space and the network.
So there needs to be some interaction there. And crucially, I'm going to throw out this personal perspective, which is no single provider can solve the problem alone.
And for those of you who just have this playing in the background, as I know many of you will, there's going to be varying degrees of knowledge here that are presented.
So the rough description is I'm just going to start with a primer about Dash, and then I'm going to go through a couple of iterations of describing a problem, explaining the cause of the problem, and then look at a solution, come back to this again, and then we can just make a few comments on the future.
And crucially, in order to ensure that as many of us as possible are on the same page, a few definitions.
So this will be familiar to anybody who's sort of third-year computer science and engineering and above, but anyone below these might be.
So the first is, you'll hear me say the word bottleneck, or bottleneck rate, or bottleneck link.
And the bottleneck basically is the weakest chain in a path on the network, and it constrains the speed at which we can transmit.
Okay, so the bottleneck link basically, that is the slowest link on the path, and being the slowest means that's the fastest that you could possibly transmit something along that path.
Fairness is this 30-40 year old debate, but it really comes down to, given a bottleneck, it becomes a shared resource.
And so how do you share that in a meaningful manner? And then the last one is bit rate.
So this, I don't even want to count the number of times this word will appear.
Bit rate here has two meanings, and what I learned in years of doing this is that unless we make this clear from the outset, people in the network, when they talk to people in the application, so people who are transmitting bits and people who are encoding video with bits, have to be very, very clear about which bit rate they're talking about.
So when we think about a Netflix video stream or a YouTube stream and you can select your quality, that is the video bit rate.
On average, the number of bits per unit of time, typically per second, that are required to encode that video data, that is entirely independent from the transmission rate, which is the number of bits that I can send through the path and what is effectively the bottleneck.
So it's one word to describe two different types of things, but I'm going to do my best to separate video bit rate from network or throughput rate.
Oh, and by all means, by the way, there are questions. We'd love to take them, email address at the bottom.
So let's proceed. Here it is. This is the way it was read.
Why does my kid get HD on their phone while I'm stuck with a blocky mess on a 60-inch TV?
It turns out, anecdotally, this is one of the most common complaints among streaming video providers and ISPs, your British Telecom, your Verizon, these types of things.
So what's happening is people are at home, they have a broadband link, probably a lot more in a COVID age, and somebody's sitting in the bedroom and they're on a tablet or a little phone and they're getting a really, really high quality feed and somebody's watching the big TV and they get a big chunky or blocky mess.
Now, I will admit wholeheartedly, if you're one of the fortunate people who live in a place where you get 50 and 100 megabit connections at home, most of what I talk about today, you're never going to see it.
If you're like me, however, this week, I happened to be stuck on a 3G modem, which actually constrains me from being able to set this up and show you live what it looks like.
This definitely matters. And it's important to acknowledge that for most people in the world, this type of idea is probably going to matter.
The bottleneck is typically closer to home.
So just to give you some perspective of the clash here, this is a general representation of if you were to take the different types of resolution, video bit rate, video resolution, and plot them against each other, this is proportionally how they would look.
So what we're talking about is somebody on their phone is getting 4K or maybe 1080, and then somebody else is stuck getting something way down here and it just doesn't make any sense.
Interestingly, the network was very good at preventing these types of things until the advent of streaming video.
And this is what we're going to talk about today, largely.
So let's start with, oh, sorry, mistake. We can always claim that life is unfair, unfortunately, but it turns out the Internet is genuinely unfair as well.
So from a routing perspective, layer three, the Internet, amazingly, if you stop and think about it, only does two things.
It does routing and forwarding. So it decides, given something, a destination, if there's a path to the destination, in which direction to send it, that sort of decision making.
And then the forwarding is just given a little packet that comes in with some data on it and a destination.
I'm going to decide where to send it and not even think about it.
Everything else that provides the services we've grown accustomed to are things that we've added on top.
From a fairness or a sharing perspective, this starts with congestion control, and this is the dominant model.
So every machine, every computer, every phone, every operating system has implemented within it congestion control mechanisms so that devices can share the available resources.
But in the network, we've added other things over time, things like rate limiting, traffic engineering, traffic shaping, these types of things.
But crucially, the Internet is genuinely unfair by nature.
So now I'm going to pause and ask this question, which is the way that we traditionally think of fairness is some notion of equal bit rate, because it's really hard to do better.
And this is what congestion control is supposed to do, give us, and it actually does a really good job of it in fairness, no pun intended.
So the question is, does equal bit rate actually represent some notion of fairness that's useful in this context?
And I'm going to claim here that notions of throughput, so the speed of transmission, delay or latency, so how long it takes for information to get from point A to B, and loss, which is the chance that something is dropped from the network for some reason.
These in the network, really the only three things that we see, they're the only three things that we can measure.
Sometimes they form the basis of other more complex things.
So you might have heard of something called jitter.
But jitter is really just a measure of delay. Okay. And all and these three things, they fail to capture any notion of what's happening at the user or the application level.
So throughput delay and loss from a network level, while insightful, there's a big gap between what they tell us and what people are actually experiencing.
And fundamentally, that's what this is about this talk. So now let's look at streaming video.
The older generation among us, and when I say older in the Internet, I mean, really not more than 10 years ago, 10 years ago, plus you will have seen this, will probably recognize or, or understand what this little image is about.
So there was a time when somebody would send you a nice little video, YouTube, for example, and you would click to open it.
And then as it started loading, you would immediately click that pause button, this thing right here, and you would wait and wait and wait for that red bar to grow.
And at least in my experience, I almost never hit play before that red bar was halfway and usually round about two thirds.
It turns out I wasn't the only one to do this. There was a few measurement studies a number of years ago, and they demonstrated something really, really interesting, which was that everybody did this, first of all.
And this was a problem for people who the content providers are streaming video.
Because what ended up happening was that on average, the consequence of doing this, and many things people watching being either uninteresting, or they get sidetracked, or anyone have a number of reasons.
On average, only about 30% of the bits that were transmitted by the provider were actually consumed by the user.
Okay, so that means 70% of the content that a YouTube or the like was paying to send on the Internet was not being consumed.
Okay, so that was one issue. The second issue, of course, is one of the reasons that you pressed play, or sorry, pause, is because network conditions were variable, and what you didn't want to see was the waiting, the buffering symbol.
Okay, so this forced people to go back and redesign the way streaming video worked.
Rather than be one big long video file that you just download over time, we decided to be smarter about it.
And the MPEG Foundation actually put together a standard called DASH.
This is Dynamic Adaptive Streaming over HTTP. And really, there are three main components to this.
Okay, the first is media is encoded at multiple bit rates.
So I take one video file, and then I do an HD version, and an SD version, and an ultra HD version, and so on.
And at the same time, when I'm encoding them, I split these up into segments or chunks.
You'll probably hear me refer to them as chunks more often than anything.
The chunks, crucially, between the encodings, and this is very important, are going to be equal duration.
Okay, now there's been some really novel work to change the duration from chunk to chunk, but the standard itself largely says just each duration.
And crucially, if I take a chunk for any of the encodings, so whether it's HD or ultra HD or SD, the duration of information inside is the same.
You're going to see why this is in just a moment. The next part of this is that when I go to request a video, typically this starts with what's called a manifest file.
And the manifest file lists a whole bunch of information that's required to play the video, not least of which is where the videos are located and the available encoding bit rates.
So what this means in practice is something like this, and I can take no credit for this graphic, but I think it's really, really insightful.
So here on the server side, you've taken your video and you can see that you've encoded it at multiple qualities.
So here's your low quality, your medium, and your high.
And crucially, they all consume the same amount of time.
And they're transmitted over the network. Let me go to the client side first.
And the network, the client might be conservative, and it says, you know, I'm going to start by downloading a couple of low quality chunks of data, see what the network is doing.
And oh, look, I can see that there's some spare capacity. So I'm going to increase the quality of the thing that I'm looking for, and I can do this yet again.
All right. The network, you see that here. So here's your small chunk, your low quality chunk.
And then as you increase the quality, it's still transmitted in some amount of time, less than the duration, but it consumes more of the resources.
Okay. So you see, problem solved.
You're done. This is so bright and insightful, doing this, that I never have to worry about anything again.
Bottleneck capacity, I'm going to detect it, and I'm always going to get the best possible quality for my connection.
Okay. And in an old classical model of everybody sitting behind a single TV, this works.
But then you have to ask, what if there's a whole bunch of devices now doing this?
And it turns out, this is where problems start to emerge. So we call this, this is idea of competing video streams.
Okay. So these are, in a traditional network, just with the web and everything else, you would think of competing flows.
Now we've got competing video streams. Okay. Because that's what the flows represent.
Okay.
So one second here. So here's, let me show you this little measurement here.
This is four clients, an iPhone, three laptops, over a six megabit link.
Okay. Four YouTube clients on each of these.
And over time, what you can see here is something that looks a little bit peculiar.
So what this says is for each of these clients, by and large, they're increasing the quality of what they're requesting, and they're decreasing the quality and increasing it and decreasing.
Okay. As it turns out, bit rate stability, this is called bit rate stability.
Suffers. So one of the biggest reasons people tune out to videos is the buffering, which we're all familiar with.
But it turns out, if you noticeably on the screen see improved quality, decreased quality, improved quality, that's the second biggest reason that people tune out.
Now, I'm going to stop here and emphasize just again, the separation of bit rate.
This is the video bit rate. Okay. This is not the transmission rate. So these lines here are what are appearing on the screen, not what's actually traveling through the network.
The reason, of course, you would go down is because the network can't sustain this speed.
So then we say, all right, well, let's think about this differently.
What about fairness? So here's a really, this is one of my favorite.
This is a measurement from a number of years back, and things have improved since this time, but it's not hard to set this up again.
So for fun, this was a Netflix on a laptop and YouTube on an iPhone.
Okay. And this is all in public domain knowledge.
These measurements appear in prior work. And here they start at roughly the same time.
And you can see the Netflix client is doing much better.
So it gets about two megabits per second and the iPhone is getting roughly one megabit in terms of video quality.
And the Netflix user decides to pause for some reason, go get a cup of coffee, who knows.
And so the YouTube client takes what's remaining for the network capacity at the bottleneck as it should.
But then Netflix comes back, the Netflix user, and can never, ever quite get back up to where you would want them to be.
Just to convince you further that this is a problem, here are a tablet from ages past, a laptop, and two iPhones.
And what's amazing here, this is where it really starts to show, is that the two largest screen devices, despite best effort, cannot compete with the two smaller screen devices.
So for some reason, these two small, the iPhones at the time, were just doing really, really well getting high bitrate across this bottleneck and the larger screens were.
Well, what's causing this? At a high level, there's a few things going on.
The first is the video players rely on what's happening in what I'll call the network, it's actually transport, but TCP fairness.
I'm going to talk about this in a little bit.
The second thing is, there's a bunch of heterogeneous devices, and in many cases, having different implementations.
So with a couple of the providers, there were actually different teams implementing clients for different platforms.
Necessarily so, and it's only in the last few years that they've started to homogenize, which is a good thing.
And there's also the standard itself has no standard client implementation.
So it's not the case that everybody just agrees what the implementation should look like, and then they take it off slightly.
Everybody starts from scratch, adhering to a few high level principles on paper.
Fundamentally, however, there's this idea of feedback control, and I'm going to get there in just a minute.
But first, come back to this question, which is, if we agree that fairness is something that's important, and in a future Cloudflare TV segment, I may actually invite some guests to debate this.
Then we ask is, is it the right measure to have just equal bit rate? When we talk about standard viewing the web and so on, we might agree that that's the case.
But now, if I'm watching video content on different types of devices, it's no longer clear.
It could be that I have a bigger screen, and so I need more bits.
It could be that I'm on an expensive mobile data plan, and I want to, if I'm paying per bit, that I want to scale back the quality so I don't pay as much.
So there are all kinds of measures of fairness.
But for today, we're just going to focus on the screen size.
And there's this great quote from a piece of work, gosh, 13 years ago now, a gentleman named Bob Brisco.
And he writes, equal bit rate or any flow rate definition of fairness is ultimately unfair.
And this was very controversial when it was written.
Because we're so used to thinking of equal bit rate on the network as being the right way forward, except that in life, it's rarely the case.
And I'll give you a telltale example, which is just driving on the motorway or on the interstate or on the highway.
Roughly speaking, all of the cars get whatever they can. But when an ambulance or a police vehicle comes screaming from behind with its lights on, everybody agrees to sort of move aside and make way for them.
So even though generally, I think most of us agree that we do the same thing, we abide by the same rules.
We also all agree that there's this special case that deserves more attention. Okay, well, let's return to the network and talk about what's happening now.
There are two views of what's happening.
There's the traditional TCP flow. Okay, and there's the streaming video flow over here.
And they're built on two fundamental and non overlapping assumptions.
So the first is that TCP, when it's adjusting its behavior, it assumes even if it's short or long, that flow, it assumes the flow is continuous.
So it assumes the flows repeatedly, unendingly getting estimates of what the network looks like.
But on the streaming video side, things are a little bit different.
There's let's say a four or five second chunk of information. But if the network can transfer that thing in two seconds, then the estimates that it's getting not only are on a broad two second average, but there's a gap in between in which things can change.
I'll show you what this looks like in a moment. So this comes down to lost latency and throughput.
They're not useful in either of these cases, or sorry, they're not useful in bridging the gap between these two cases.
So I'm going to go out on a limb, add to what Bob says in the past and say there's no metric at the network level, no network network that we can see to reflect the user experience of streaming video at the application level.
It's a little personal comment here.
So what's fair? I would like to think that equal bit rate in this case is probably going to be less than ideal.
Okay, in which case you want to do fewer bits for the smaller screen than you would for the larger screen.
So let's come back and talk about quality of experience and what actually that is.
There is a whole history of how to encode video, ridiculously capable and bright people and companies doing this.
And it's amazing what they've managed to accomplish. And there are different ways to describe this from a numbers perspective from a quantitative level.
Okay, so one of the holy grails is how do you get a quantitative measure that is representative of the quality that's being experienced.
One of the more accurate ones is called structural similarity index. And really what it is, it's just a measure of given two images or given two frames, what is the difference between them.
Okay, so if I take the original and the encoding, and then I do this structural similarity comparison, the higher the relationship between them, then the more likely it is that the compressed version is close to the uncompressed version.
These are application layer metrics, to be clear, they're not visible to the network.
Well, there's another thing that they don't do in this particular example is historically, none of them took into account things like the size of the screen or the viewing distance.
Viewing distance is really hard, I'm going to ignore it for the time being.
Now the thing about the screen resolution, you want to keep in mind is, if the video resolution, the optimal cases when the video resolution is exactly the same as the screen resolution, then you get one pixel in the video encoding mapped onto one pixel on the screen.
Otherwise, if it's greater or lower, the image is going to degrade in quality somehow.
And what you certainly don't want to do is send down more bits than there are pixels on the screen, that's just a waste.
So one early effort was that we actually sat down to figure out how to do this measuring the screen quality, because in an HTTP request, you know something about the device that's being used.
And so you can kind of make guesses of how big the screen is. Now I'm going to switch gears a little bit, and this is a little history lesson.
This is a really old idea, and keeping in mind when we talk about networks, old here is 25 years.
And this is called bandwidth utility. Let me try and explain this, it's a bit tricky.
What happens now is equal bandwidth. So if there are two competing flows in the network, setting streaming video aside, just normal transport, TCP, congestion control stuff, will do its best to make sure that both of these flows will get the same bit rate.
And what you'll notice here is that if you give them both the same bit rate, and you look at the value of those bits, so the utility, to the customer, to the client, the eyeballs, it's starkly different in this case.
So even though you're fair in terms of the number of bits, it turns out the value of those bits is very different.
And instead what you might want is some notion of what we call bandwidth utility.
So this says instead of starting with the same bandwidth, let's start with the same utility.
And this is that horizontal line. And what you see is order for both flows to have the same utility, you can actually afford to give, in this case, the red one less bandwidth than you would the black one.
There's another idea, this is a little younger, but not by much, 1999, 2000, called utility maximum fairness.
So maximum fairness is almost as old as computer science itself.
We're going to apply this principle to utility. And what it says is, when I have extra to give, I want to distribute in such a way that the least beneficial recipient is the least distance from the most beneficial recipient.
You want to close that gap. Well, in a bit rate maximum fairness, which is what conventionally exists, for some capacity, you have two flows, and then the fair share is just half.
If I take the utility maximum fairness, as I showed you previous, that's a little bit different.
So this says, recognizes that the black flow here needs a few more bits, or the red one needs a fewer bits in order to get similar utility.
And so I'm going to actually allocate throughput accordingly.
And that's what gets me to this figure here. So if I'm looking at the bottleneck, I'm looking at the pipe, I would say, oh, this is unfair, because one is getting more than the other.
But actually, from the end user's perspective, they're both getting the same quality.
And we can do this with video. The trick, though, with video is, it can't be a straight line, because we have discrete encodings.
So let me show you what this actually looks like in practice. This is with an older version.
This is, I think, about five years old, with the bit rate ladder that Netflix used to use.
And if we apply this metric, what you can see here is, if I just take roughly 1,000 kilobits, one megabit of bandwidth, and I can see instantly that the iPhone here has a higher notion of utility than any of the big screen TVs.
So instead, what I'd like to do is say, wait a second, what if, oh, and crucially here, if I add these two up, I have roughly two megabits worth of throughput that I can allocate.
So knowing that I have two megabits, instead, what I'd like to do is allocate them so they're as close as possible in utility.
And you see here, if I give the iPhone just under 500 kilobits, and the big screen TV just over 1,500 kilobits, I get the same utility between.
Well, can you actually build this in practice?
It turns out you can. We tried it.
We just took a home router. We called it VHS. Those of the older ones among us might appreciate the name.
And really, what it does is it just uses standard Linux tools, so in this case, TC, some code, some matching on regular expressions.
In this case, it was just Netflix and YouTube at the time, to see what we could do.
And it turns out, actually, you can do really, really well. So this is one of the older examples.
So this is before you do any of this fancy stuff in the network about trying to allocate bits according to utility.
And if I apply this, here are the straight lines that I get out.
So this is now you're sharing a six megabit link between four Netflix clients at the time on different kinds of devices.
And you can see here, it's a free-for-all. But by doing this nice little trick, if you can just see the flows, you can see the requests coming through, and then you can allocate capacity accordingly.
And what you see here is the iPhone gets slightly less bandwidth, gets a slightly lower bit rate than the medium screen devices and the bigger screen devices.
But crucially, they all get the same utility.
This is a different example, just for argument's sake.
Let's say not everybody's watching video and people are downloading files and so on.
Well, here's the even bigger free -for-all originally, and you can just allocate some resources to do what's called the bulk transfer.
And you still get that nice property of bigger screen devices get more bits than lower screen devices, but crucially have the same utility.
So you say, hey, the problem is solved.
With throughput latency and loss and some visibility into the flow, you're done.
But wait a second. It turns out, actually, that's not the case. So this is an old piece of measurement work.
When I say old, I mean just a few years. It's a very simple idea.
It describes many of the things that we already know, but it's a lovely read in practice.
And this just says, look, we are moving from an HTTP world to an HTTPS world of security.
On a personal level, I would claim there's plenty of evidence to suggest that Cloudflare is actually one of the pioneers in this, one of the drivers of pushing TLS 1.3 and greater levels of encryption across the board.
And it talks about what the costs of this are. It means that there are a number of services we're used to having that we actually have to think about redeploying somehow.
So now let's look at the real problem. It turns out the application and the network traditionally benefiting from isolation, the layered view.
Maybe now they're starting to suffer a little bit. So this is a very, very personal comment.
Being an educator, former educator, there's something that we see in education which I wish was not the case.
So if I think of the application layers being up here and the networks and systems as being down here, there's something that happens from an educational perspective that I think actually contributes to this.
And the first is students tend to stream. I mean, it's natural for us to do so based on interest.
And what happens is people who stream towards the application layer end up getting a lower level of understanding about what's happening in the network and the systems layer.
And in this case, TCP and the way congestion control works.
But similarly, the people who stream towards the network and systems were conventionally taught that if we do things right down here, then we shouldn't need to worry too much about what the application wants to do or how it wants to do it because we've done everything right.
So there ends up being a separation.
Now, amazingly, the Association of Computing and Machinery, the ACM, fabulous professional organization, anyone working in the domain, I highly recommend you become a member, has a four-year high -level syllabus for computer science education.
And it breaks my heart to see that within that four-year span, there's only a minimum of 10 hours recommended, just 10 hours of network education, education about how the network works.
It breaks my heart.
I wish there could be more, and I hope people are listening. Of course, the separation is now further exacerbated because now there's this new dash implementation.
It's using the network in the way that the network was never designed to be used, and now it's starting to surface issues.
So these two things are unfortunate, and now this thing comes along, and it's just going to basically exacerbate these differences.
Let me show you why that is. So let's focus just on the application.
So this is taking an open-source JavaScript client, sorry, a dash client written in JavaScript, and it's just being fed some video.
Now, the red line here is we are artificially enforcing a bottleneck rate, and it means we can vary it.
So step by step, every few seconds, we increase the bottleneck rate, and then we decrease it on the other side.
And the interval by which we're increasing it is to match the encodings that are available for this particular video.
So let's just walk through what happens here. As we step through, so I'm going to increase the bottleneck capacity.
The green line is the application estimating what that capacity is.
So if this is the actual bottleneck rate, the green line is the JavaScript application, the browser effectively, doing its estimates, and it's actually not bad.
It's fairly close. It hiccups every now and then, but by and large, it's pretty close.
Interestingly, this purple line is the actual video bit rate that is consumed.
So the first is the estimates aren't too bad here on appearances, and the second is there's sometimes a big gap between what the application is estimating to be available and what the application actually requests.
This shows that the application is being conservative, and it's being conservative because of the fundamental underlying ideas behind this talk.
If you're not convinced alone that it's being conservative here, I want you to look here between this green line and this purple line.
So this green line says there's this much capacity in the network.
This purple line says I only need this much capacity in the network, and it's lower than the green line, and yet the application still doesn't request that bit.
So the application at least is being smart and saying I can see what I think is there, but I'm not 100% sure, so I'm to scale back one step.
Now let's switch gears. This is from a fabulous video online about TCP's sawtooth behavior, and for those of you in the know, I just want to remind you that some version of this in congestion control is what happens.
So on the flow level, the network, what I'll call the network, conservatively increases in time.
When it senses congestion, it backs off, and it increases, and it backs off.
This is happening on the order of milliseconds. This is happening on the order of seconds.
So let's actually look at what happens when you combine the two together.
This is time on the bottom.
Somebody's trying to watch a video. This dashed line is important here.
We're going to call this the bottleneck rate, and this solid line here is what the application is requesting for the video bit rate.
Okay, it could get more potentially, but again, it's being conservative, and so this is actually what it's asking for.
And when we first start, oh, and the blue here is the time that it takes to transfer a chunk of video.
So in this case, what I'm trying to show you is, let's say that the chunking, basically, is five seconds worth.
So a chunk of video will have video content from zero to five seconds, a second chunk from five to 10, a third chunk from 10 to 15, and so on.
The way the network works, or TCP in this case, is it doesn't know what you're trying to do.
It's just going to get you bits as fast as it can. So it's entirely possible, and in fact likely, that the five seconds worth of content encoded in the video chunk will be retrieved in less than five seconds.
So in this first chunk, your bottleneck rate is up here, and even though it's five seconds worth of video, it gets reduced down to, it gets retrieved in about three seconds in this case.
And all the while, TCP is doing this thing where it increases its bit rate, it overshoots a little bit, so it backs off, and then it starts a little bit below the bottleneck bit rate, it overshoots a little bit, it backs off.
There are new congestion control mechanisms that are trying to do this a little bit better, but for now, this is what the network largely lives by.
So five seconds of video comes down in three seconds.
Crucially, it takes less than the five seconds to download, and that's the first observation.
The second observation is that in that time, network conditions change.
So the application, its view of the network, by the time it goes to request the next thing, its view of the network is two seconds in the past in this rough representation, and anything can happen in here.
One of them is that TCP maintains its state for some amount of time. So when it decides to make the next move, it'll make the next move as if there's been no gap in here, and if the network somewhere else has changed, this could go horribly wrong.
The other thing that it can do is return to slow start. So basically, and that's not represented here, but I want to make it clear, is one of the things that can happen is some amount of time passes and the network state changes, so TCP makes a bad move.
One of the other things that can happen is enough time passes, and by default, it's 200 milliseconds, I believe, in most operating systems.
The TCP will just decide it's been too long since the last time I had a network estimate, so I'm just going to start over.
Either way, I'm hoping you can see at this stage that there's a big gap what the network is trying to do and what the application is trying to do, and you get into these gaps here.
So the application is doing its estimates, TCP is doing its little estimates, and as a result, we get this free flow.
What do we really want here, if this is the case? What we probably want, I'm going to call these things session rate equality and some notion of experience equality, and I'm going to show you what these mean, but let's just define them here.
The session rate equality is some notion of TCP fairness per client, bit rate equality per client, so what I want is even if a video client decides to cheat and use two TCP connections with the hope of getting data twice as fast, I want to treat that whole session, so both of those connections, as one.
Then at least I can get into some notion of equal bit rate for fairness, but what I'd really love to do is this QoE fairness, which is I'm going to allocate different bits to different sessions so they both experience the same, even though they're getting different bits, and I claim that in today's world, this is impossible unless there's some sort of client-facing API or other interaction between what's happening at the application and what's happening in the network.
And this graphic is meant to reproduce that idea, okay, so this is a little bit of stepping back for philosophy, sort of broad thinking.
Here's the operating space, so flow rate equality is what we're traditionally used to.
A bunch of TCP flows, they each get equal bit rate.
I'm just leaving this as number one so you can see it.
This idea of session rate equality says the video client, no matter how many connections it uses, it's never going to take more than its fair share, okay, so if one web transfer, one bulk transfer, and one client, but the client uses five connections, rather than get five out of six, it should get half.
This here is the notion of equal utility, okay, so different bit rates but the same experience, and this here is something little extra on the side, some candy, which says maybe we could actually use the network better if we could communicate information back to the client, but I'm going to ignore this for the time being and just focus on news.
What you'll notice here is there's the network view of the world and there are the client views of the world, and now in an HTTPS universe, the clients see everything they need to, but the network is completely encrypted, okay, so I can't go looking for HTTP requests in the network and know what a client is trying to do.
How do I deal with this? Or here's what's worse, what's the problem with this?
The problem is I'm actually in this no person's land here, okay, the network is designed for flow rate equality, but we can't see anything about what's using the network, about the things using the network, so I can't even assure flow rate equality, it's not possible, and this is the reason we need these interactions, okay, so if you say, well, wait a second, flow rate is enforceable, it must be.
I'll come back to this idea of using the one client using more connections than the other, okay, so here's a version of Netflix a few years ago, three years ago, versus Dash, the Dash client 1.6.
1.6 is important for a reason, okay, this was just before Dash decided to, this was just after Dash decided to reduce itself down to one connection, so before this, Dash would use multiple connections, at 1.6 it settled on one.
Netflix at the time was using more than one.
Again, not necessarily a bad thing, it's, there's a perfectly good sound reason that Netflix would want to do it, because clearly its customers are going to do better.
What you can see here is Dash suffers as a result, and you say, okay, well, maybe I can do something about that, it turns out, actually, no, here are two different versions of the Dash client, before and after, so the black line is when it used two connections, and the red line is it using one connection, and you can see here, even then, not much you can do if you rely solely on the endpoints, okay, and this is why we need a client -facing API.
Let me show you a slightly different version of this, so you can see, so this is three different flows using BBC Test Card, bless the BBC, actually, for making so much of this open and available to the public to test against.
This is the current landscape that we see now, so that the, all these dotted lines are the estimates of these three clients, trying to figure out what the network is doing, and the solid lines are what they're actually getting in terms of video bitrate, and this is less than satisfactory from a, both a customer and a provider's perspective, so really, what we need to be able to do is to enable communication, or at least some kind of information flow about what's happening on the client side versus what's happening on the network side, okay, well, interestingly, we can preserve fairness if we do this, okay, so here is the current landscape that I showed you just a moment ago.
If we can do this by way of flow or session rate fairness, then we get back to a world that we're in right now, and this is the ideal.
I'm pausing here for a moment because I really kind of wonder, maybe, maybe there's some portion of the population watching this, actually, that thinks, okay, that's lovely, but maybe we do want to do better, okay, and so if we can throw in this notion of utility, then we can get to this place that I showed you before.
Now, I want to be clear about what's happening here.
This is no visibility into the data stream, okay, so the packets are completely encrypted, but what's being provided here is an open API to the client, which says, in this case, just using web sockets, and so the client, if it chose to do so, could say, you know, I've got a screen that's this big, and here's the types of video, and the bit rate ladder that's available to me, and so on, please give me what you think is best, in order for me to get the same level of quality that other people would be getting behind the same bottleneck, and what you end up with is something rather lovely, which is the smaller screen device gets less than the medium screen device, gets less than the bigger screen device in terms of bit rate, but crucially, they all get the same experience.
There are some takeaways here.
Okay, one is this notion of competing feedback loops cause problems, and I showed this to you in a sort of application network level, where they're both trying to estimate the available throughput, okay, they both want to estimate the available capacity, but we've seen this in other places in the past.
One is if we talk about long-haul wireless links, so people setting up, as I've done in the past, rural or remote broadband networks, for example, might take two directional antennas, standard Wi-Fi, and point them at each other, and expect to get some throughput rate, but actually see a lot less.
If anyone in the audience is trying to do that today, it might very well be because one of the defaults, typically, on Wi-Fi is that at the wireless Ethernet level, retransmissions are by default, and what that means is on that single link, if one of the antennas detects there's an error in a transmission, then it will feed back to the sending antenna, and the sending antenna will retransmit that data, okay, and on the surface, looking at that link, anyone looking at it would say this is an absolutely fabulous thing to do, and it is from the perspective of the link, except that from the perspective of the end-to -end connection, TCP is being used to send data, and one of the reasons that TCP exists, or one of the features that TCP provides is reliability, which is when it senses that there's a loss in the transmission, it will retransmit that data, okay.
Well, on the surface, that sounds okay, but actually what's happening is, remember, bottleneck capacity, there's a limited amount of resource that has to be shared or used, and so the link, in trying to be smart, decides to consume some of that resource to do the retransmission locally, while the end-to -end connection, which is much further apart, may also detect that loss and retransmit accordingly, so along that bottleneck link, that single between those two antennas, you would see multiple retransmissions for the same thing.
It's almost as if the link and the path are competing with each other in order to be reliable, and what ends up happening is they just reduce their throughput rate overall.
Now, why don't many people see this?
Why don't you see it when you're at the Wi-Fi cafe, for example? It's because the distance between the client and the antenna, with the Wi-Fi access point, is really short, and so that happens far more quickly than the end-to-end connection could ever see it.
It's not that the problem doesn't exist. It's just that it's resolved internally before it ever is witnessed or detected externally.
The people who are trying to do those long -haul links, they might see it because the distance between those two antennas is way longer.
In fairness, in our particular case, we only detected it across about 10 kilometers.
That's when we started to see it, and it was particularly bad.
Let me take it away when it comes to feedback control.
I struggled in preparing this talk because I can't find any other place where we would see this type of competing feedback control by any stretch of the imagination, so let me give you some cases in point.
Imagine you're driving a car. A classic feedback control loop is you look to see how far ahead the car is in front of you, and you push your foot onto the gas or off of the gas accordingly in order to roughly keep pace, okay?
Now, there are some systems out there, cruise control-type systems, that will try and do this for you.
They'll look ahead, and they'll say, based on the distance and the speed and so on, I can accelerate or I can decelerate and so on.
I hope, I hope that no one in their wisdom would say, while the car is trying to decide this and do the feedback control on, you know, distances and input and gas in order to accelerate or decelerate, I hope that no one will then turn around and say, oh, but you know, I can make the computer smarter.
I can make the car smarter by doing this in opposition or in addition to.
Similarly, a temperature control, so your thermostat at home is a feedback control loop, okay?
You would never see two different thermostats in the same room trying to control the same temperature.
So, engineers know this well. One feedback control loop, no competition, no nesting.
But unfortunately, we do see it in the network, and it's a consequence of the layering that we see, okay?
The layering that has benefited us for decades in this particular context actually causes us to suffer a little bit, because the people who live at different layers and the work that's being done at different layers, by and large, doesn't need, historically, doesn't need to think about or look at what's happening in other layers.
And there's probably a need to change that in the future.
What do we do to solve it in this case?
Look, HTTP video is here to stay. There's no rolling that back, and there are good reasons we would want to avoid rolling it back.
Remember, paying for more bits on the wire than are consumed, for example. So, we need to break that layering somehow.
And in the last few years, there's been some great work doing this.
I'll talk a little bit about what that looks like.
Here's a big one, and I'm going to take a risk here. So, I'm going to claim that no single provider can solve this problem alone.
And it's a risk in that no one really wants to believe this.
Many people want to avoid this thought. So, any single provider, if they come up with a solution, that solution will be fabulous for them.
And it might work in many, many cases. The problem is, as soon as any provider fixes on a single solution to try and solve this problem, that means anyone else can come along, easily detect what that provider is doing, and then gain whatever they do in order to do better.
Okay? And there's only one of two outcomes here.
Either somebody decides to play nice, and somebody else decides to come along and not play nice, or everybody decides to compete in a sort of a Wild West-type manner, and then the network starts to collapse in on itself.
Neither one of those is satisfactory.
But here's a thought.
Even though no single provider can solve the problem, and I really want to avoid thinking about this, but in the interest of transparency, this has come up in conversation with people, one or two people at Cloudflare, but also, more importantly, people who are in other providers as well.
People do talk to each other.
And it's this idea of, if we acknowledge here that there's an issue, and we acknowledge that not one of us alone can solve the problem, what if the top four or the top five providers just agreed on a solution?
Does that solve the problem?
So imagine you take Netflix, YouTube, Baidu, Amazon, Disney, there's an increasing number in the market, of course, all kinds of people, and they all get together, and they say, of all the video that's being consumed in the world, our services take up 90 or 95%.
If we agree among us, does that solve the problem?
Honestly, I have no good answer to that question. It's an interesting one, I think, to think about, because it could be that the smaller player finds a way to do better and gains competitive advantage, or it could be that the smaller players are completely locked out.
And so instead, what I would argue is, while it might be the case that this problem can be solved by a small number of people working together, the better solution is to develop and adhere to some kind of standard or best practice or similar.
Something that I would like to think that Cloudflare so far has been very good at doing, and I'm confident will continue to do in the future.
So future and some open questions.
So where does this go next? One of the emerging spaces in the last few years has been some form of machine learning or AI.
This has turned out to be particularly promising.
The details are the subject of many more Cloudflare TV sessions, potentially.
Suffice it to say, there's a lot of data out there, there's a lot of compute power out there, and people are doing some amazingly bright things.
Okay. I will, however, come back to this idea of be careful about doing this in isolation.
Okay. So most of this, almost all of this happens in an open research context, which is incredibly important so that we can debate as a community and figure out what works and enable other people to do the same.
One of the interesting questions that comes up every now and then is does fairness even matter?
And if there's positive feedback for this idea, I would love to invite a couple of people I know argue against fairness in the network, and that we probably shouldn't care about fairness.
We should care about protecting the network resources so that the network never collapses in on itself, but any notion of fairness, there's just no point, no one will ever agree, and so we shouldn't worry about it at all.
If nothing else, I think it's a subject of a very, very lively debate.
And the last one here builds on all of this and asks, does there need to be some agreement or a standard?
And what would that look like? To be honest, it's far from clear.
And how it would be done is even less clear, but I hope the community on the wide has shown itself to be extraordinarily capable in this regard, and I would like to believe that this is going to be true in the future.
So let me just check and see, there are sometimes questions that come in, I'm not sure if any have, it should be fine.
I am entirely reachable. Right, if you have any feedback, please, by all means, send it down the pipe.
Happy to expand on this further, even pursue the fairness discussion.
Coming up next is John Graham-Cumming.
He's going to be talking about the make utility. This is part four, I believe, of a long running series, and it's fabulously entertaining.
I highly recommend it.
Thank you very, very much for attending, and I hope to see everybody again soon.
Transcribed by https://otter.ai Transcribed by https://otter.ai Transcribed by https://otter.ai