Cloudflare TV

Leveling up Web Performance with HTTP/3

Presented by Lucas Pardue
Originally aired on 

Join Lucas Pardue, QUIC Working Group Co-Chair and Cloudflare engineer, for a session on how HTTP/3 is supercharging web performance.

Episode 2

English
Protocols
Performance

Transcript (Beta)

Okay, I hope you've enjoyed that. Let me do my spiel again very quickly. I'm Lucas Pardue.

I'm an engineer at Cloudflare. I work on protocols and stuff like that. So effectively, the team I work on are responsible for components that terminate HTTP connections as they come into the Cloudflare edge.

I work in London, but I work with a diverse range of colleagues all across the world, both inside Cloudflare and across industry on things like standardization.

So last week, or the title of this whole segment is called Leveling up Web Performance with HTTP3.

So that below at the bottom of the screen right now is my Twitter handle if you want to get in touch.

But we kind of took a base level explanation of HTTP3 last week, at least based to me.

I'm so far close to it now. It's hard to understand what is kind of tacit knowledge for people or what's not.

So I try and explain some of the basics.

But if people have questions, like please get in touch and email us at livestudio at Cloudflare.tv.

We want to make these sessions interactive. So I can answer them either like right now, or maybe in a few minutes as I move on, or whatever.

But anyway, we're talking about HTTP3, which is an application mapping built on top of this new transport protocol from the ITF.

And this transport protocol is secure and it's reliable, and it mitigates head of line blocking.

That's all like clever big speak.

But basically, it takes something like TCP and security similar to TLS on the top, runs over UDP, and it's kind of the combination of all the nice things that we'd like to have that we've progressively been building up through as the years of iterating upon HB2 gone along.

So you mentioned HB2 and TLS. This is kind of the diagram I presented last week on using streams and how those things weigh up and stack up.

On the left -hand side, just to recap here, we've got at the lowest level and working from bottom upwards to the top, you have TCP.

And we like secure protocols these days.

So we've got a TLS layer on the top of that is HTTP2.

And we have streams in there. And then what QUIC did was borrow some of the elements from H2 and kind of do some Jenga here and take a piece out and put it back in and give us streams at the QUIC layer and have HB3 on the top of that as an application.

So, you know, these technologies or these protocols just don't emerge from anywhere.

And I kind of mentioned or glossed over this history of QUIC last week, but I want to dig into a bit about it now just so you can understand the importance of the news that we have this week.

The really exciting stuff.

But, yeah, if we can dig into this picture, I appreciate some of the streaming quality.

Might not lend itself to small text, but the gist of this is that, you know, trying to standardize new protocols is a tricky thing.

Everyone has a great idea.

They're good in their context or their use case. And there's a desire to standardize things maybe because there's a perceived benefit by creating a common, say, platform or approach to speaking.

Client server model is a great example where you want web browsers to be able to speak to a whole range of web servers and CDNs and reverse proxies and these kinds of things.

And so kind of talked about this last week that Google had worked on this QUIC protocol kind of in tandem with Speedy, which was a protocol that was a precursor to H2 and was the basis of HTTP2.

In parallel, some of the headline blocking issues that we talked about last week, QUIC was designed as a solution to those.

So while that was going on, Google was doing this work and they were deploying this Google QUIC protocol and they wrote some specs up that were public and they were really good to see and to read and do a whole range of years.

You can't see the dates here. They're too small.

But, you know, back in 2012, we had a version of the spec and live actual deployments using UDP, which some people questioned, but to develop a secure protocol on the top that could mitigate some of the problems.

And through that process, they would gather data and apply it to use cases like search page loading or YouTube video quality of experience.

You know, if you watch a video and you improve upon the time below the video or the time, sorry, the chance encounters of rebuffing, which is.

But anyway, if we dig deep into that, you can see that the kind of branched off from this red GQUIC line kind of in the upper half of that diagram into a document that was the, well, a phase of work, which I've labeled here, the QUIC IETF proposed adoption.

And from that, what they did was take the Google Docs versions and kind of split them out into a couple of protocols written by some of the engineers at Google at the time.

And they presented something that was the transport working group area QUIC protocol document.

And throw it out there, see who maybe in the IETF community is interested in this thing, solicit some input.

And I went through this whole range of things effectively, which I might explain in a short second if the technology lets me.

But yeah, what we did as a community was look at the documents, understand the experimental work that went behind that and decided that while Google QUIC had some good evidence that it could improve things, it was quite a baked together protocol.

The stacks that I previously showed on the last slide were very much kind of melted together.

If you imagine a cheese toastie in the UK, which goes into a machine that compresses things and heats them to plasma levels of heat, they're inseparable.

It tastes great, but if you fancy some kind of different filling in the middle, very difficult.

So the IETF wanted to take the good parts of a transport protocol, make them reusable.

But what we ended up with is this kind of splitting of all these documents into different things.

And yeah, we've been progressing through that. We're up to draft 29 of the whole family of these documents right now, which, you know, I wouldn't read too much into numbers, but effectively the important thing that's happened this week is a tiny text.

So let me read it in a narrator voice. After more than three and a half years and substantial discussion, all 845 of the design issues raised against the QUIC protocol drafts have gained consensus or have a proposed resolution.

In that time, the protocol has been considerably transformed.

It has become more secure, much more widely implemented, and it's been shown to be interoperable.

Both the chairs and the editors feel that it is ready to proceed to standardization.

Therefore, this email announces a working group last call for the following QUIC documents.

QUIC transport, QUIC loss detection and congestion control, using TLS to secure QUIC, version independent properties of QUIC, HTTP 3, and finally, QPAC header compression for HTTP 3.

The working group last call will run for four weeks, ending on the 8th of July 2020.

As a reminder, we've been operating under the late stage process with a link to it.

In theory, this means that the contents of the draft already have consensus.

However, the chairs would like to actively reaffirm the consensus and start the process of writing a review through a formal working group last call.

So this text is taken from an email that the QUIC working group chairs sent to their QUIC working group to basically say to people who are members but maybe don't follow along on the day-to -day basis of things that the documents are in a good place.

We have issues. We work on GitHub.

We kind of develop the specifications as if they're code. They're written in Markdown, and they get translated and transmogrified into the various lovely textual formats of RFC file that we all like to read.

Or maybe not all of us, but some of us.

But we've been tracking issues following kind of a fairly robust process compared to some other groups.

Not everyone does it this way. Typically, you might just email some patches to the working group and say, look, I think it would be better if you said it this way rather than that.

Or, I don't understand this.

This doesn't work. I've tried it out. This is the whole process of standardization that we do.

So I'm going to try something a bit different here.

If you just hang on a moment, I'd like to try some live whiteboarding with a pen that I have.

It's going to work. I think it is.

So that's a horrible color. I apologize for the red. Let's go for blue.

So we talk about QUIC, and I think people like to think that we have that layer diagram, but ultimately, it's QUIC as a big box of stuff, which is kind of fine if you want to treat things as a black box.

But actually, I like to think of things a bit more like the game Sim City, if anyone's ever played that.

So instead, what we have is, say, maybe all section or QUIC.

And this is the little QUIC box right here.

What we have is zoning effectively going on. So we carve up this space that we mentioned.

Transport document. We've got a lot of engineer's handwriting.

We could color that in a different color here.

Pink. Lovely.

And then we could break into some different stuff. So we have congestion control.

That's supposed to be a C. Another different color. And effectively, what we can do with all of these is build out building blocks or foundations for what elements of a transport protocol are needed.

But what we don't have in there is an application mapping.

Each for handshake.

So this is effectively, you know, QUIC is a secure protocol.

It needs packet protection. But how you establish the keys that help protect those packets is effectively modular.

And these elements in here are modular.

So we can change stuff up. You don't want to scribble that out. But interestingly, in all of this, what we don't have inside the QUIC block is HTTP3.

Which is an application mapping which works on the top of these things.

It's related to and it's dependent on certain properties of QUIC.

But, you know, it doesn't need necessarily to know all of the inner workings.

And that's where this document called the invariant comes into play.

Which is actually a bit of a lie. Because the application mapping doesn't worry too much about that.

Oh, I think the pen just ran out.

Brilliant. So, yes. The invariance is effectively a description of the bits on the wire of QUIC that don't change between versions.

And what are versions in this case?

Okay. I think the entire machine is frozen up. Just deal with this a moment.

Okay.

Brilliant. Okay.

So, okay.

I think I might be back now. Unfortunately, I think my computer barfed at my drawing.

So, apologies for that technical issue. I am just going to try and continue on this session.

The audio is going to be a lot worse here. Probably hear the echoing.

But, yeah. I'm just going to try this out. Let me hit the record button.

No, I can't do that either. Oh, dear. Okay. So, let me try to steal this.

Nope. I can't screen share either. This is all going wrong. Right.

So, thank you for the technical issues.

Nobody. So, I'm going to have to fly by the seat of my pants here.

But effectively, we talked about the working group last call.

That is a very important milestone in the whole process of QUIC. You've seen over the years, we've gone from Google QUIC into the ITS.

And it's effectively laying down a marker for us to say that we're ready almost, in a way.

We've been working through this rigid process to open issues and make sure that the group is happy with the changes that are happening throughout.

And, you know, we can't always agree on these things, but the idea is to get rough consensus and running code.

And, like we've seen last week, the interoperability matrix is pretty well populated.

We've got a lot of clients. We've got a lot of servers. We've got open source software.

We've got different kinds of libraries and stuff written in different codes.

But, you know, that's not the only thing that's happening in the QUIC world.

So, back to slide share, if I can. Bear with me one moment.

Dead air time.

I need to fill this. So, the other thing that's happened in the world of QUIC this week is a new working group has formed, which is called MASC, which stands for multiplexed applications over secure QUIC UDP encapsulation, or something along those lines.

You can make it as UDP as you want. Maybe it will be QUIC and it goes from being an acronym to just being a thing that you shout because it's in capital letters.

So, what is MASC?

Let's get the screen sharing going. I've got a whole load of slides here, and they're great.

So, I hope you can bear with me just for this one moment while I figure this out.

Why don't I show you my face in the meantime?

Oh, look. The background is actually the birds of a feather recording.

So, hello.

I wonder if I can be heard now. Maybe. I'm trying to recover from here. Some technical fun.

There's two of me right now, both panicking. I have no idea what anyone's been able to hear for the length of this talk.

I apologise profusely for trying to be too clever and drawing images and stuff.

Right now, I'm scrambling, and I realise I've wasted some of your time.

One of my colleagues, Peter, is just saying he can hear me, but no slides.

Now I'm worrying that there's weird feedback echoing going on, and Peter says he can still see.

hello, Peter.

Can you hear me now? I'm still waiting.

The joys of live experimental video. That's the correct one.

Yes, this will have to do. It's not going to be as clever as I wanted it to be, but yes.

So, mask is a great description. My colleague, Chris Wood, is the co-chair of that group.

I want to make this clear that it's a separate activity to Qwik.

Basically, it's going to be something that works on top of Qwik, say, or uses the application mapping of HB3 to do some stuff that people wanted to do.

This isn't a brand-new problem, but it's taken us a while to figure out between different stakeholders what makes sense to work on, what is common, what is maybe deployable within the constraints of different networks and stuff.

I just want to give an overview of what tunnelling is.

I want to do this in the context of HTTP. The best way to do this is to steal some slides I presented earlier.

In the UK, there's a TV show called Blue Peter.

I don't know if any of you are familiar with that one.

But, yes, what they always do is say, here's a thing. It looks great. I made it earlier.

So, proxying. Traditionally, in the old school days before TLS and people being so worried about security, we had HTTP 1.1, let's call it, forward proxies.

As a client, you would try and open a TCP connection to example.com, and you believe you have done that.

But, actually, what you have in the network is a proxy that's kind of there transparently, say, which isn't great, but they use sometimes in corporate networks, say, traditionally, so they could maybe aggregate some traffic or provide some transparent caching to avoid costly bandwidth at the time of early Internet deployments.

So, yes, the client makes this request, and it thinks it has a single TCP connection between it and the final server here on the right-hand side.

But, actually, there's two. You can configure this kind of thing yourself.

You can go into your browser settings and set things there or your system settings, and sometimes tools will always have an advanced options in their networking to set up an HTTP proxy, that kind of thing.

And you can even use a command line environment variable, say, to pass that into curl is one way to do it, or there's other options to be a bit more clever and specific.

So, then we have a world where we want to be a bit more secure, and, yes, what we came up with, or not me, I'm not taking any credit here, is a method called connect.

So, this was a way to kind of work with a known proxy in the middle, so the client would be configured to know that there is a proxy there, and, otherwise, it's not possible to make a single TCP connection from the client to the server on the right-hand side.

There needs to be a step with a proxy in the middle, and what you do with this connect method is effectively you ask the proxy to open a TCP connection of its own onto the server, and from that moment, if that succeeds, you'll get a response code at the end that says this has been successful, and from that point on, the client creates an end-to-end TLS tunnel within within the two TCP connections.

So, the proxy in the middle can't see any of that, the secrets are not visible, unless somebody wanted to make them too, but I wouldn't recommend that.

So, we'll assume in this world that, you know, from there on in, there's an end-to-end encrypted tunnel, and this is a useful feature.

It is used in practice from different kinds of client applications.

Again, you can use an environment variable like HTTPS proxy when you're invoking something like curl on the command line or whatever, and then we enter HB2, and, as we mentioned before, HTTP2 introduces concept of multiple streams in a single connection.

So, what do you do?

Do you take over the entire TCP connection for just one TLS session inside?

Well, it kind of doesn't work that way because requests and responses consume a single stream, not the entire connection.

So, in the HB2 definition of things, you can talk to a proxy in the middle.

Oh, no, this is a lie. This is a 1 .1 proxy.

So, actually, yes, you do connect in the middle, and you open the TLS connection to the HB server on the end, and then you just do your normal HB2 request responses in a stream.

It gets way more complicated when you introduce QUIC, and, actually, the H2 case is very similar where you state, so, on a single stream, you do a connect example, say here, to a proxy in the middle.

Here's an example of how it could work.

You effectively reserve a single QUIC stream to the proxy in the middle for the transmission of TCP-based communication.

So, an application protocol that can use TCP is done over a single HB2 stream within a TLS connection, within a QUIC stream, within a QUIC transport security envelope within UDP, which is all crazily complicated, but the HB3 standard does define connect to allow you to do this today, and so back in IETF 102, thinking ahead to maybe how I can use QUIC to do a similar thing so I can have QUIC end-to-end, you know, what I wanted, say, was a UDP between the proxy and the server on the right-hand side, QUIC transport security and streams in there, but QUIC within QUIC effectively, and so, yeah, at the time, there was nothing that would allow you to do that.

Connect defined only a way to create a TCP connection between the proxy and the server, not a UDP association or anything like that, so hence the red question marks here.

Let's go back to the slides that may or may not have appeared earlier.

We've talked about the transport, congestion control, recovery, QPAC H3, those things, but there's other working group documents in QUIC.

They're not in working group last call, but they're in adopted documents in the working group.

We've got the applicability and manageability that talk about ways to use the protocol or considerations when you come to deploy and look after connections, and in the space of the last year, we've also adopted three new documents.

One is called QUIC LB, which is about generating connection IDs, so every QUIC connection has a set of IDs, and those exposed on the wire that can be read, and certain types of application, either hardware or software, need to sometimes understand the right QUIC packet route, so from when a client sends one in, you need to be able to route that to the correct server and back out to the correct client, and getting that right, it can be straightforward, but there's some clever stuff you can do too, depending on the kind of network topologies that you're doing.

We've got a document about version negotiation, which I'm less familiar with.

I do apologize, but you might imagine it lets you do compatible version negotiation by whatever that means, and then the important relevant one here is an unreliable datagram extension to QUIC, so what is that?

We've talked a lot about streams and how those are reliable and provide in-order guarantees within the stream itself, but actually across streams, there's some problems that maybe we'll come back to, and yeah, what the datagram extension defines is a datagram frame, so this is a way to just effectively send a frame, which is one of the smallest units of a QUIC connection with some payload in it, without it being reliable, so again, it kind of does what it says, but the benefit compared to just plain UDP is that you have the whole QUIC connection association going on, so you've got the security there, you've got all of the other facets of the QUIC connections defined in the family of documents that I was trying to describe before everything went a bit wrong, and this is a really cool feature.

It's actually a pretty short document in terms of itself. There's some considerations to think about when implementing this, but it's, again, it's a real building block to coming up with a different kind of application mapping on the top, because sometimes if you've got something like QUIC, if you go back to our example here, but we want this to basically, you know, UDP is an unreliable transport, and QUIC is designed to run on top of an unreliable transport and add reliability.

If you run QUIC within QUIC, you come up with some weird kind of effects against each other, which are kind of known in the space.

I don't know if we know solutions for them, but there can be problems where you've got multiple levels of congestion control, so things thrash against each other, or, you know, the way there's a feedback loop that can happen when some loss is detected when actually there wasn't any.

There's optimizations that you can make, especially when things are unreliable, and that's one of the main use cases for Datagram, is that not everything is important, actually.

You might send a piece of information that's important for a small time period, and it's relevant, but after a while, it's useless, and so by having a fundamental building block, you can start to create different kinds of application mapping, and stuff like we're doing right now, real -time telecoms or communication is very apt there, and so the Datagram frame forms a basis for being able to do this quick -within-quick quick tunneling more properly, and so, yes, we went through a BOF, a birds-of-a-feather session, to kind of understand, say, as a smaller group of people who collected our ideas together, things like this past presentation, other input from different groups, what the ITF and a wider group of people might decide to work on, and then to figure out what the technology might look like, not to define it up front, that's what the working group is left to do, but to have a picture of what problem we're trying to solve, and effectively, what we'd like to see is this UDP connect method to say to the proxy, I want you to be able to route Datagrams sent to you on this thing, to unpack the payload inside them, and send them forward onto the server as a UDP payload.

What I put inside that UDP payload is up to me, you don't get to decide, but I could use that for quick, that kind of thing.

Right, so that's kind of the news recap.

I hope it was fairly interesting with all of the interruptions. So I'm going to change tack to a different topic that's more more HB3 related, so if you tuned in for web performance stuff, I apologize for this background, but it's exciting times, and I wanted to kind of promote the wider scope of work that's going on outside there.

So I'll just check for questions if there's any, or if not, if you've got some, because you're completely confused because the technology failed, send us an email at livestudio at cloudfair.tv, or like try and tweet me, and I'll maybe spend some time towards the end of the session doing stuff and answering your questions, but otherwise I'm going to do some like live browser sharing, so enough of the slideware for now, and talk about priorities without any ability to visualize them unfortunately.

Well, with my pen, we'll use a proper tool, so just bear with me a second, I'm going to try load up these files.

So you should see in my browser now, this is the QViz tool that I mentioned on last week's call.

This is a tool by, I'll name Robin Marks, but I know there's a whole team of researchers at UHASALT doing web performance-based stuff, and it's really cool.

They work kind of in the background, and do a lot of analysis, and then come back and show people why their implementation's broken, or it doesn't perform so well, but you know, maybe we'll come back to that, and explore those things deeper.

But what I want to talk about is priorities, so this is a HTTP level construct.

I'm just going to load up some Q logs that I captured just the other day, because what I've been working on for the kind of in my day job, beyond bad television, is as an engineer working on our open source implementation called Quiche, and implementing HP3 features.

So we have a priorities thing that I've been working on with with an engineer called Kazuo, but actually involves a whole wide community of people.

We went through a whole process of saying, look, here's HP3. It's incorporated HP2's priority as model.

It doesn't work so great. Here's something that could be simpler.

What do you think? Some people were interested in implementing it.

I've been working kind of theoretically for a while with this, and so it's been nice recently to spend some time thinking about how we can get that into our implementation.

So along with Alessandro, my colleague, and the rest of the team, we looked at how we could put this into the library itself.

But of course, what you need to be able to do is test that code.

So this is part of the benefits of having a visualization tool.

One way that developers of protocols kind of do stuff is to run a command line client against an endpoint, or say they have a client or a server.

They wait, and they do a test. They request some resources, and then they kind of parse through textual logs and kind of try and pinpoint if it worked or not.

Maybe it said success at the end. But sometimes it can be hard just to understand the entire behavior over a connection.

And so Qlog is a way to output that textual happenings, the events that happen in the connection, in a format that somebody else has put all the hard work into visualizing.

So having loaded some test files, we can go into, there's different tabs at the top here, but the one I'm just going to focus on today is a multiplexing view.

And so we have a whole load of test files in here.

Let's try and see.

Yes, let's see if I can zoom in a little bit for you. There's a lot of devil in the detail here.

I don't want to focus too much on it. Let's try and give you an interactive view of what's happening.

So if any of you are familiar with browser dev tools that show you a waterfall, basically what this test case is is the baseline of how did Quiche used to work before we did anything differently.

So what we have is a test client that made five requests for a one megabyte file, in this case, all at the same time.

So as soon as the connection opened, as soon as we did the handshake, the bouncy ping pong ball analogy I used last week that was good or bad, depending which way you look at it.

But as soon as that completes, then we open up client initiated by directional streams and issue requests on them, all for the same file, all at the effective same priority.

And so what this graph is showing you is not when the request was made, but at what point the client started to receive stream data.

So the payload data effectively coming back from the responses.

So every request was for the same one megabyte file. So we would expect the length of the bar to be the same.

But depending on what the server's doing, the offset from the start might be different here because different servers might send stream the first part of the response at a different time.

In this case, what was happening is the responses were coming back almost in lockstep with each other.

And this is what the multiplex data flow underneath shows us.

It looks more, I'm sure the encoding of Zoom is kind of mangling this even more.

I wouldn't worry too much about it. But you can see as I mouse over, we're getting a nice tool tip to say stuff like, oh, let's zoom in.

It's going to work. Okay.

So, yeah, what we get is little bands that relate to the requests at the top.

So if you imagine yellow is request one, dark purple is request two, and orange is request three, we can see that basically we're getting a stream frame coming in of a certain size, in this case, like 1,300 bytes effectively.

And then it switches from stream zero onto stream eight, sorry, stream four, and then eight, and then 12 and 16.

And this sequence repeats. So effectively what the server's doing is round robinning between each of the requests, serving back some of the response data, 1,300 bytes, and that's packed in a single quick packet because that's the size of our quick packets in this case.

And it's really fair, right? We had five requests at the same time.

And we basically respond to leave the requests one by one.

Everyone gets a fair share of the the cake here, effectively. This view underneath is quite complicated to explain.

Yeah, I'll do that another time. But it's interesting, effectively shows you the rate that data came in on.

And these black vertical bars show you the date, the HTTP application data reading data out of the transport layer, which is pretty complicated to comprehend.

I had to get Robin to explain that to me a few times.

And now it's become tacit knowledge.

So it's too hard to explain is the best cop out here. So this is the baseline performance.

And in the priorities work that we've been doing, if you're thinking of loading a web page, and the first request is the HTML document, like, waiting for that whole thing to be transferred in tandem with a picture at the bottom of the page, and they basically both steal share from each other doesn't work out that great.

You're delaying the finalization of the document based on all the things that came after it, maybe.

But actually, in practice, that probably wouldn't happen.

Because, you know, you didn't learn about those things yet to make the request.

So the test case is a bit contrived. But what it's trying to do through a simple test case is just to prove the basic prioritization logic that we were trying to implement and test.

So having implemented some of that, we came up with a different test case here, which was expressing different weights for different requests.

So the first request had the effectively the highest in our model of the, sorry, so the document I've been working on with Gazir is called extendable priorities, which allows us to express or signal priorities with two parameters.

One is urgency and the other is incremental. So the urgency is like a weight, it tells you how important the resource is to be transferred.

The lower the number, the higher the urgency effectively.

Got to love those inverse numbers.

But incremental says that yes, actually, for these two things, for what, sorry, for this resource, any other resource at the same urgency, I would like you to round robin effectively and share.

But the default is that I want you to save me all of this one thing before other things.

And the way that you do that ordering of things is based on the stream ID.

So in this case, all of the requests have the same urgency.

And we've disabled the incremental flag. But we see that what we have is a strict ordering of responses.

So request one comes in first and completes in, say, around 270 milliseconds, the whole transfer.

Compare that to the previous log, which we'll look at quickly, because I didn't explicitly show that, where for that entire transfer, all resources completed in around 400 milliseconds.

So you can see that we completed a bit faster in this case.

And that's that. That proved that the changes that we worked through worked for one simple test case.

So that's the base level of stuff.

So that was good. But I want to come up with some more clever test cases.

This is proving that if we send requests with all the same urgency and incrementally enabled, we actually can get back to that base level state.

So you might say, well, that's not great. Didn't you say that this isn't what people want?

But it depends. Like, we can't anticipate exactly what a client is using our server for.

They're accessing resources, and they can give us hints to tell us what they're trying to do.

And we can do our best effort to respond to that. It's fairly simple, relatively, for a static web server that has access to all of the files, be able to very quickly understand the size of them and do the scheduling of responses back to them.

But unfortunately, things aren't simple, really, in practice.

There's a lot of nuance that goes on. We've blogged about it in the past on the Cloudflare blog.

You should check that out if you're interested in this kind of topic.

But there's been a lot of other output from different people in the community.

You know, the kind of the work that happened in HTTP and quick working groups that led up to the scheme that we're working on now was an active mailing list thread about the complexity of implementing reprioritization servers and how difficult is that.

And a lot of it comes down to protocol aspects, which I don't have the time to get into today because of some of the technical failures.

So, I apologize for that.

But if it's of interest, do let me know and we can dig in deeper to some of this stuff.

But anyway, back to the case at point one, test two. So, we've proven that we can do urgency, which is important.

And we can do incremental.

And you could say job done that. But that's not how my brain thinks. I came up with all these different test cases.

I can't actually remember what this one is.

It looks the same as test one. But it does something slightly differently. I can't quite remember.

Maybe it does request them at different agencies. But what this proved through the visualization is that the behavior of our client and server matched what our goals were, which is pretty important for these things.

Test seven.

Okay. So, this is a I want to request the first three objects of the five as urgencies of whatever.

And that they're going to be incremental is false. But that if there are incremental responses mixed in with them, that they get served in a round robin fashion.

So, you can see, again, we have this kind of nice sequential serving of resources in a waterfall model, except for the final two responses, which get interleaved with each other, leading to this kind of purpley blue wavy pattern.

What else have we got?

Ah-ha. And, yeah, this is to prove that one of the most important aspects of the prioritization is kind of related to some of the activity that we're talking about in the working group right now is the ability for later requests.

So, things that were requested after request one, say, or last in the batch of any request can be served first.

So, you know, sometimes as you're scanning through a document, you realize that you really need the thing at the end.

But you didn't know about it before you asked.

It's like ordering something from a online store.

And I don't know, I'm doing some DIY at the moment, found in lockdown, as some of you are.

And you go on, you order a load of stuff, and you realize that actually the thing you thought about last needs to arrive tomorrow before the other things.

You don't care anymore. So, you can go in and, you know, if you're making this order all at the same time before it's been dispatched, say, you might be able to boost the priority.

That's kind of a reprioritization if you've already sent the initial request to somebody.

But if you knew upfront that you needed to expedite the delivery, you know, that's the case here.

The final request trumps all others.

It gets delivered there first. Yeah, that's about it for the test cases. There's a lot more to dig into.

I just want to mention the black bars underneath this colored view.

What that represents is data loss and data retransmission. Took me a long time to figure out that.

So, again, thanks, Robin. But, yeah, this aligns with these weird kind of staccato stepped effects that go on here.

For an untrained eye, this looks just like a bit crazy, maybe a rendering glitch.

But what it's showing is that, say, although if you think of a stream and we talked about bytes being delivered on the stream that we lost some and that we could continue getting the bytes afterwards and, like, what this is showing is effectively blocking on the stream itself.

So, these vertical black bars show the client application being able to read data from the transport layer.

It can't do that while it's still waiting for some of the earlier data that was lost effectively.

That's a pretty bad explanation.

But that's the best one I can do right now. Kind of coming into the final five minutes, I'm just checking if there's any questions.

I've got none. That's pretty sad, guys and girls and everybody. Yes. I'm either a very calming talker or this is completely uninteresting.

But it's what really interests me.

And so, that's why I wanted to talk about these things and take the time to do stuff.

What I would have liked to do is kind of sketch out with the pen how kind of serialization and multiplexing works and some of the background.

I did have a blog post on this. Let me just see if I can pull that out while I'm talking.

Going into some of the details of why we wanted to adopt a new approach to prioritization.

You can see me type here. And what I wanted to do with this document is kind of document a blog post.

Kind of talk about Friday the 13th.

It's a lucky day. And have a recap. Let's zoom in on this while we can.

Because this is kind of the model I wanted to show out. Sketch out.

So, let's see if that's big enough. But in H2, you know, we can make all these requests.

But we'd like a way for if you imagine the document model in the HTTP page, like the document is the root.

And as you work through it, you learn about different things.

And you form this kind of dependency tree. If you're a developer, you would be able to walk the tree of DOM elements.

That kind of thing. But we don't need to get that technical.

We just imagine that as you come through and learn about stuff, you make some requests.

And, you know, there's a tree, effectively a logical tree that exists.

An HTTP2 connection. And what you say is, in this case, request one depends on the root.

And then request two depends on the root. And request three depends on the root.

And depending on their relative weighting, a server can take that signal and say, well, you asked for these three things.

I'll give them to you in roughly equal order.

From the signals you're providing me, they're all of equal importance.

So, I'll just do what you asked for. And the way that that is expressed is by a headers frame.

We talked about this last week. That's taking your textual headers and encoding them into a header block fragment, which is, you know, the HPAC encoding.

But also some priority information.

So, you have a stream dependency here. And a weight, which I already mentioned.

In this graph, the dependency is on root. But what you can do is get clever and say, well, actually, I'm going to request this piece of HTML and then an image and some scripts.

I know that those things are exclusively dependent on each other.

So, rather than a balance tree, like I just showed you, what we have is, you know, actually, as a browser parses through a document, it realizes that the first request is actually dependent on the second request that was made.

Similarly, the third request is the most, effectively, the most important.

And the other two are dependent on it. And through a whole combination of these signals, a server can be clever and try and figure out how, you know, of having a smorgasbord of stuff that you've asked for and then you've defined a preference, the best way to present that back to you, given all of the resource constraints that it has.

So, scanning through this document quickly, there's a whole back story here, which let's skim through.

But effectively, as we mentioned with QUIC, the order that the client sends stuff in, so, in this case, the client sent request one and then request two and said that request two depends on request one.

In QUIC, due to the network, which can reorder stuff, because it's allowed to and it does and can happen, maybe this request two that has a request one arise before request one is in existence.

The server can only know what was requested when it was requested.

So, if you have a dependency on a thing it doesn't know about, you end up in this sticky situation.

Does request two depend on kind of a placeholder for request one, hoping that it might arrive at some point?

And if it doesn't, because there's no guarantee that it would, how does it react?

Or do you maybe just say, oh, well, I'll stick it onto the route for now and then I'll wait and if request one arrives, I'll change my mind or something.

In a simple case like this, on a one-to-one basis, it probably doesn't matter too much.

But if you have a whole bunch of these requests that come in, you have to make like a whole load of assumptions and implementation detail about how to handle this.

And that's one of the kind of complications of what H3 had to get through.

There were a lot of edge cases, which aren't shown on this page, but there's some really good slides in the ITF out there and the kind of the ITF's IP note well contribution.

So, they're all online, they're all part of the record. There's videos of some of the ITF sessions and other kind of blog posts.

But I think here highlighting the HB workshop in April 2019, so just over a year ago, we kind of kicked off what's wrong with prioritization.

I'm talking about H3, but there's been problems with H2 as well.

There's some test cases that show actually, it's really hard to get this right and a lot of the implementations out there started and it's hard to even test them.

There wasn't something like Hube, it was at the time, it was hard to visualize them.

So, from January, all the way through to the end of the year, in May, we hosted a whole meeting in our London office and that was fun.

We talked about the prioritization spectrum here. This is Ian Sweat's presentation, Ian Sweat from Google, who did a great job of kind of chairing our design team to figure out of all these options, what can we do?

It's no good saying to everyone, here's a load of problems, we don't know what to do.

Trying to come up with a rough consensus, an idea of how to solve this thing and a whole wall of text.

And then, yeah, here is from December. So, this is from the last ITF large event I attended at the Quick Table.

This is me sat next to Robin doing some hacking on priorities and QViz is actually there in tiny, tiny detail in the middle.

It's kind of nice.

Then, Kazuo is on the right-hand side here and some other people at the table.

So, it's been a real cross -industry effort and it's been great. There's a long way to go.

We're still debating a lot of the details here. Whether this is the only way to do it, I don't know, but it seems like the most affable across the options is a simple enough scheme to let us do something.

It aligns well with some of the work that Cloudflare are doing with exchanging priority information in headers.

So, if you're writing like a web application that can process requests and responses, yeah, it's more friendly to that.

Yeah, as I wrote at the time, 2019 was quite a ride and I'm excited to see what 2020 brings, but yeah, it's been quite a ride too.

I'm more excited to see what the rest of this year brings now.

But anyway, thank you all for your time. I do apologize for the issues.

Please ask any questions if you have any, but otherwise look forward to the other live streams that we're and the other content on this channel.

I think my colleague Tim is on next and talking about some cool worker stuff.

So, bye for now.

Thumbnail image for video "Leveling up Web Performance with HTTP/3"

Leveling up Web Performance with HTTP/3
Join Lucas Pardue and friends for in-depth explorations on using the latest web technologies to enhance performance and security!
Watch more episodes