Cloudflare TV

Leveling up Web Performance with HTTP/3

Presented by Lucas Pardue, Yoav Weiss
Originally aired on 

Join Lucas Pardue (QUIC Working Group Co-Chair and Cloudflare engineer) and special guest Yoav Weiss (Performance Engineer and Developer Advocate at Google) for a discussion about the roles of request prioritization and server push in HTTP/2.

English
Protocols
Performance

Transcript (Beta)

The web, a digital frontier. I tried to picture clusters of HTTP requests as they flow through the Internet.

What do they look like? Tigers? Goats? I kept dreaming of analogies I thought I might never see.

And then one day, I found Yoav. Hello, everybody.

Welcome to another episode of Leveling up Web Performance with HTTP3. I'm Lucas Pardue.

I'm an engineer at Cloudflare working on the protocols team on technologies like HB2, DLS, QUIC, HB3, those kinds of things.

And today, I'm joined by a special guest, Yoav Weiss.

Yoav is a developer advocate for Google. He's been working on mobile web performance for longer than he cares to admit on the server side as well as in browsers.

He now works as part of Google's Chrome's developer, Google Chrome developer relations team, helping to fix web performance once and for all.

He takes image bloat on the web as a personal insult, which is why he joined the responsive images community group and implemented the various responsive images features in Blink and WebKit.

That was his gateway drag into the wonderful complex world of browsers and standards.

And now, when he's not writing code, he's probably slapping his base, mowing the lawn in a French countryside, playing board games with his family.

So, welcome to the show, Yoav.

Have you got anything else you'd like to add? I think that about covers it, yeah.

Would you rather talk about HB3 today or the base? Yeah, let's go with HB3.

Okay. So, it's a different, sorry, if you watch any of the previous episodes leading up to this, started with maybe, you know, the first two going into a deep dive into H3 and Quick, and then we had Robin Marks and Peter Wu on to give some demonstrations about the debugability.

On this show, I wanted to kind of change things up again and go more into looking at some of the features of HB3 and H2 that I've glossed over so far, and those are push and prioritization.

So, we'll get into that in a bit, but these are kind of topics that sometimes people think they understand, and I know I always forget.

So, I was going to just bring up some slides and walk us through some of the issues, and hoping that Yoav will interrupt me when I haven't described something very well, or just have a general discussion.

So, yeah, it's kind of, I know Yoav from a few of the standards meetings, like the ITF that I've been to, or related things where we're able to kind of share different views.

As the intro said, you know, we represent sometimes the server side of things from past lives or the browser side of things, but the standards are kind of the importance of interoperability, and sometimes the ITF work focuses on the network end of things and the plumbing, but I know, you know, your focus has kind of been more towards the performance angle of stuff.

So, I don't know if you want to, before I get into the boring slide if you want to talk a bit about maybe your background there a little bit more.

Yeah, sure.

So, I've been working on mobile web performance for over two decades now, which is a bit, I don't know, it makes me sad that it's not yet a solved problem, and we're still not there.

A lot of the, essentially, there is a lot of things to improve at the network level, and this is where the HTTP2 work came in, and HTTP work is hoping to improve things now.

There are also a lot of things to improve on the content level, and in terms of processing on the client itself.

So, yeah, I've been, so, I've been working for Google for the last 18 months or so, but I've been working as part of the Chromium project, the open source project, since 2012.

First, on my own, just as an evening pastime, then as a part of the responsive images community group, which was my gateway drug to this whole world of web standards, both at the W3C and the ITF, and then, from there, I continued to work at Akamai, working on both server-side optimizations and the client-side features that are required for those server-side optimizations to work.

And now, working on making the platform faster as part of the Chrome team.

And so, you're also doing some work on client hints as well, which is maybe not solely performance-related.

I kind of, I don't know, I'm not the expert here.

Like, they could maybe improve performance. I think the motivating use case was like data, was it?

Or am I completely wrong? So, the motivating use case initially was around images.

So, as part of the responsive images community group, we worked on source and picture for client-side selection.

And then, we saw that there is a need for server-side selection as well.

So, the client hints proposal came from that.

And since we realized that, generally, content negotiation in its traditional form is critical for performance, it's critical for other types of content negotiation for adaptation.

At the same time, it is privacy-negative by default.

So, by default, the client just sends out all the information to all the servers that may or may not be interested in it, which results in some bloat in terms of network.

But more importantly, it results in passive fingerprinting information, like, potentially fingerprinting information that's being sent to all servers, which makes it very hard to track which servers are actually using that, using and abusing that information, and which servers are just, you know, accepting it and not doing anything with it.

So, client hints is, in my view, a critical piece of infrastructure to enable privacy -preserving content negotiation, both for performance purposes, so, for images, for network adaptation, but also to replace the user-agent string, to replace other bits of current content negotiation that are sending too much information by default.

Yeah, all right. I think, you know, certainly some of the work I do, it's, you know, you've got, what you want is kind of the additional, you want better security and better performance, and maybe some people think that's hard to get.

But, you know, using, carefully defining these standards and, like, working across the ecosystem to understand the impacts of stuff, there's a big focus on the privacy angle of stuff, right?

And actually, you know, documenting the different kinds, active or passive, and, you know, since I started in HTTP working stuff, got things like GDPR coming in, and all of these kind of factors that are pretty interesting.

But client hints is pretty near the end of its standardization process, right?

Yes, on the ITF side of things.

There's still a bunch of work to be done on the W3C side of things, and in terms of defining and getting agreement on the specific processing model that browsers should apply to it, there's also the rather large subject of cross-browser adoption.

So, we had, we've been going back and forth with Mozilla folks and Apple folks about the proposal itself.

I think we've reached a good ground where they are somewhere between not objecting to it and slightly happy with it, but there's still, like, I would still love to see implementation in non-Chromium browsers, and this hasn't happened yet.

Oh, sorry to put you on the spot with that.

I just, I'm, I use this slot as a way to get better insight into things.

So, like I say, I might normally have asked you about over a coffee, but anyway, now you can take a break while I'll bring up my slides.

So, hopefully this works this week.

Yes, my speaker notes away. You can use it easier if you want.

Yeah? Yeah. Cool. So, I start always with a brief recap, just in case anyone hasn't tuned in or they forgot, but HP3 is an application mapping on top of a new transport protocol called QUIC, which is secure, reliable, and it mitigates head-of -line blocking.

So, this is a transport protocol based on top of UDP, and it fixes some of the problems that we found as we worked through classical HTTP onto HP2 and finding issues in TCP, and basically redesigning transport from modern age and redesigning something that is going to be reusable for a lot of applications.

So, we're talking HP3 on this show, but there's other things that could use QUIC in the future, which I kind of don't talk about, but there's a lot of stuff in the background.

It's, you know, my role within the ITF is to see some of this stuff. So, there's a lot of extensions coming in, or, you know, how to design modifications to QUIC that would work really well in a data center, or maybe on a satellite link, very different network characteristics.

And what's cool with this transport protocol is it seems adaptable enough to support those, even if out of the gate the main use case is for us to focus on the web.

And so, our favorite kind of layer cake diagram, if we're going to draw this up, they're compared to H2 to H3.

At the bottom, we've got TCP compared to UDP.

We've got TLS to provide encryption. I put QUIC at the same layer, but that's a lie.

These things, these diagrams are always models and always wrong.

But ultimately, QUIC is an always secure protocol that wraps in the TLS handshake, effectively TLS 1.3 handshake, and provides packet protection rather than TLS records.

But that's a detail we don't need to worry about today.

The important thing is kind of this layering. So, in H2, we have streams above the H2 layer that are part of the H2 mapping.

And therefore, any libraries that implement H2, or things like a browser, need to provide all this stream monitoring and accounting themselves.

Whereas with QUIC, that's provided in the transport layer.

And HB3 still needs to do some concept of streams, but it doesn't need to worry about like the accounting of them, the flow control, stuff like that.

So, in HB2, we have this idea of all these different frames that could be sent on different streams, stuff related to request response, like headers, frames, and data, or priorities.

So, the push, which is what we're going to look into today and the next slides.

And then all this stuff that I just mentioned on the right-hand side about connection control.

So, things around exchanging settings with your peer to understand what you can do in that specific connection, telling your peer that you're going to go away.

That was a hard thing in like earlier HTTP versions.

You might just need to kill the connection. And that doesn't lead to a graceful state.

So, you end up in this kind of weird limbo a lot of the time, which sucks.

And related to that reset stream, that's a way to, you know, if you decide you don't want something, you can reset it without having to tear down an entire connection or eat the cost of having to receive the whole object.

And so, HB3 is intended to be effectively the same set of features as HB2, albeit on top of QUIC.

So, these slides are just copied from before. So, this probably doesn't make much sense, but, you know, people part of the process has been on a bit of a rampage to take what the H2 frames were and see if they were a good fit.

Some of this has involved just redesigning the frames to all different numbers and different types.

But also, actually, at the end of the whole process, what we've ended up with is only a subset of the frames in H3 are required.

We've got here priority has been removed.

We'll talk about that towards the second half of the day.

But we've got this continuation of frame, which is a way to send really big headers.

Don't need that in HB3 because the headers frame itself can just be big. That changes some stuff.

It simplifies things. We've got two more frames related to push promise, which I'll explain shortly.

And then this other stuff around ping and reset stream.

But those features still exist, but they're at the QUIC layer.

And so, that means in some sense that implementations are simplified. What's nice, really, is that things like flow control aren't duplicated.

There's a single view of it.

And so, an H3 implementation can be nominal. So, a push, what is it?

You can say maybe high level, it's a way to optimistically send resources to a client before the client asks for them.

You might say that sounds great.

What could go wrong? This is one of the interesting additional features of HB2.

One of the prime goals of HB2 was to maintain compatibility with earlier versions of HTTP and not introduce anything, any new features.

And yet, we have this new feature or push.

So, what is it? It's effectively a way for the client to make a request and the server to respond with multiple responses.

In HB2, it's enabled by default.

A client has to explicitly disable this by sending setting at the start of a connection.

They can also constrain how many pushes or pushed responses might come back to them by setting this value of max concurrent streams.

So, by default, a client that doesn't send any value for these things will have push and a few hundred streams, I think, would be enabled.

Which means that a server that wants to push could push a whole load of stuff at you as soon as you make a single request for a web page, for instance.

So, there's an ability for the client to respond to that and cancel pushes that are coming its way by resetting streams, which has to have a state for them.

The spec describes all the kind of conditions around bearing down things.

It's an important detail for the course of this discussion.

But in this case, in HB2, you have stream IDs, which I'll show you in a minute.

But effectively, all odd-numbered streams are used for bind-initiated streams, and those can only carry requests in HB2.

And all server-initiated streams are even, and those can only carry pushes.

There might be ways to extend the protocol to do different things, and their extensions are not really aware of any extensions that have made it through, like, widely deployed that do anything with those things.

So, it's reasonable to assume, like, if you're looking in a tool like Wireshop or something at the stream IDs, this is how they're being used.

So, to give, like, a visualization of this, you know, we've got a client that's making a get request for just slash.

So, it's going to send a request on stream ID 1 to a server on the right-hand side that isn't clearly labeled.

But the server is going to come back with this push promise frame. So, it's going to send that frame on stream 1 and promise a new stream on ID 2.

And what it's going to do is provide information in this push promise that looks like a request.

So, it tells the client, I'm going to effectively pretend that you made this get request for pushed thing.

At that point, the client might be able to reset the stream and say, I don't want it or other options.

But assuming it doesn't do anything, like in this example, the server is then going to kind of immediately follow up with the actual response headers or pushed thing in this case.

So, it's going to send a status and then a length and then a data frame of that length and it will proceed on to actually serving the initial thing that the client requested.

So, just slash in this case. That's one way to do it. There's a lot more decisions that a server actually needs to take.

It might want to promise a few things and then send some data.

There's a lot of text that explains, well, if you don't promise at the start and you start to send back some of the response that maybe includes like HTML with a link to pushed thing, then the client's going to read that and probably request it.

So, if you promise and then you promise too late, you end up in a kind of racy condition.

But there's some guidance that, you know, it's better to do this than not to do that.

And that kind of text and spec is sometimes a bit hard to kind of enforce or even test.

And I think that's one of my main comments with push, which will come on to some of the real world practicalities of it.

But the tooling that we have available, because this is quite a low level feature, doesn't necessarily help us see what's happening in the wild and to reason about why a page is acting as it is.

Sometimes it's there. But, you know, when you compare this to some of the other dev tools that's able to like look at how much time is spent composing a page and stuff, it's just not never quite there, in my opinion.

Yoav's looking to see if he's going to correct me, I think.

Yeah. So, on that front, I think that the way to go here and the way to ensure that we avoid raciness is by making sure that the push promise is being sent before any link headers or other references to the resources are hitting the client, which is anyway what you actually want to do.

Because if I may, and I didn't prepare slides, but I have this diagram in the blog post I published back in 2016.

The main benefits of push is by being able to send, there are two benefits.

One is being able to start sending things from the server and start to fill the bandwidth before the server side processing that's required for HTML generation is done.

And that processing can be very short on very efficient server.

It can be extremely slow in less efficient ones. And being able to start filling up the bandwidth with useful content is a huge advantage of push in those cases.

And then secondly, it enables you to kick off the slow start process earlier and ramp up the initial congestion window with, again, useful content.

So, you want to send those push promises way before you're starting to send your HTML to the client.

That's the ideal use case. Yeah. Because you've got, especially for things like a dynamically generated website, it might require some database lookups or whatever, and you've got some dead air time when that's happening.

And you could fill it with maybe some statically cached asset. But, I mean, from having joined Cloudflare and seen some of the challenges with that, you need some awareness of what that page is and what those resources might be.

And you could take, there's various ways you could do that.

Some vendors offer smarter ways than others, but it's complicated.

And there's always a slight risk you're going to get it wrong.

So, there's always like with these things, I'd say with performance that it's a trade off between those two things.

And running and measuring those things can be really, really difficult.

I know in my own experience, it's really easy to demonstrate like, oh, look, it worked really well for this one kind of thing.

But then as a kind of a holistic view over the Internet, getting those wide scale measurements can be quite tricky.

Yeah. I don't disagree about the complexity.

I used to, this is a product, like I worked on that product and with the team that built this product as part of my work at Akamai.

So, it's certainly achievable. At the same time, it is certainly a complex solution that requires a feedback loop of your CDN or your server studying resources that this page have used in the past, figure out the critical resources among them, and then attempt to push them.

And beyond just, so there was a point in, I don't remember which ITF, but where both the Chrome team and Akamai presented data about what they see in the world when it comes to H2 push.

So, Akamai presented data that was specific to optimized websites, first load only in order to eliminate some problems that we'll talk about soon, and showed that it consistently managed to use H2 push in order to get benefits in the wild.

At the same time, Chrome was measuring data about push in general and like, how is push being used in the wild beyond those sites that are being smartly optimized?

And it turned out that more often than not, it's being abused in the wild and the overall data was slightly negative.

Yeah, and I remember watching those two different sets of presentations and it was really interesting to see how defining your population can infer the outcomes of discussion.

And it's great to have different people with different perspectives to say, like, yeah, of course it's not going to work in general because actually there wasn't, the implementations of H2 are fairly new.

To implement the protocol is fairly straightforward, like we saw, just send this frame at this time, but then to understand how to surface that as an API to developers, I think is generally the tricky part.

And there were some things that different server implementations had, like I think Apache had a push manifest, so you could say, when you request this path, then you could push these files.

And I think it even ran a runtime push dictionary, was it called, where it would, within a single connection, only push those items once.

It would keep a record of what it was doing, but there was some smart stuff.

So let me go back to the slides. Here we go.

So this TV segment is about HB3, so H2's old hat, man. Let's go back to the new stuff.

So in H3, push is disabled by default. It's, like, just not there.

It's kind of migrated away from this setting into managing a flow control window of push IDs.

So that'll make more sense on the next slide, but effectively the client still manages both concurrency and enablement via this max push ID frame.

So it starts off at zero.

The server can't do anything, and it would only be after, like, the handshake completes, and this frame comes in, does the server say, okay, I'm now able to push this many things until I get another one of them.

And the client can use an explicit cancel push, or it could reset the streams.

The benefit of cancel push is that you can effectively cancel the push, obviously, but before the server allocates any resources to it.

So you're not creating a stream or any state for that stream.

It's kind of this notional, logical idea that the server would like to push you a thing with an ID, and you can say even before it's ready to, that it's going to go.

And then the other good thing is that the server can also cancel.

So it could say, actually, I've changed my mind. Whether any server does that, I don't know, but it's all there possible.

And unlike in H2, where all streams server push, all server-initiated streams, in H3, we have this kind of idea of subtypes.

So the server would create a new stream and send one byte on that stream that's going to tell you the type, and that will be the type of push stream, followed by, basically, the contents of the delivered response.

So it looks almost identical.

Okay.

I can't hear you, Lucas. Hello. Can you hear me? Yes. Okay. Thank you for holding the phone.

Yeah. I wasn't sure what you mean there. That was exactly on the half hour.

I forgot to tell you ahead of this meeting that I actually had an appointment, and that was the whole reason.

No, I'm joking. So I don't want to spend forever just talking about push.

So let me race through my slides. Yeah, it basically looks almost the same as the H2 example.

And I think it's important to note that these things should work the same.

And the question for us as developers and infrastructure providers is, should you push?

Is this the right thing to do?

Is there any lessons we've learned? And those lessons that we've learned are applicable to both versions.

So we can decide to implement the feature in both if we want to, but how do we apply that?

And some of the context around what we learned from H2 is kind of Jake's seminal piece of H2 push is tougher than I thought, which is a great deep dive into some of the real-world practicalities of server push when looked at through the lens of caching.

So the push spec has a lot around, you're going to push stuff into the browser's cache, and the items need to be cacheable and stuff.

That's always the way I look at it. I know others don't.

And so there's some weirdness that can happen where, say, for example, rather than an item being pushed straight into the browser's cache, it sits in this concept of a push cache that doesn't really, isn't formally yet defined anywhere, but different browsers or user agents have.

And then it might get promoted into the browser cache if the web page actually requests that resource.

And it can be tied to the connection.

So maybe if you got pushed some data and then you lost the connection like I just did, then that resource is gone and the bandwidth to send it to me was wasted.

To be honest, I don't know if three years on, if all of those points are still true, but I remember at the time it was quite an eye -opener for a few people.

Yeah. So for me, the main conclusions from that article is that there are a lot of client inconsistencies with regard to the handling as well as just outright bugs with handling H2 pushes in some browsers.

And for me, the conclusion was that H2 push wasn't really well-defined when it comes to the browser's processing model.

So it was defined at the protocol level, but not defined as to how those pushes should be treated on the client side, what is their level of interaction with the browser's cache.

So you're right that all browsers implemented some sort of an H2 push cache, but they all, like they had subtle differences because this is all just a result of the easiest way for them to implement it rather than the way that it was specified, because it wasn't specified.

And that resulted in a bunch of issues like you mentioned, which if we were to discuss these issues in a broader forum, maybe we would have concluded that resources should be pushed directly into the cache or some other mechanism.

Like we wouldn't have necessarily reached the same one as all the implementations ended up or ended up implementing.

So it's just a lack of specification was the main, like the original sin here as far as I'm concerned.

Yeah, and it's like I think Jake says in that post, it's not pointing the finger at anyone, it's just that's how it is.

And you adapt the technologies available to the model that you have for your implementation and pick something that may be a bit safe for a brand new feature.

Like you don't know how this is going to work in practice.

Yeah, it's been an interesting thing.

I think for me, the main annoyance is the fact that SliverPush wasn't exposed to the web platform.

So the concept of being able to generate events like we can with the fetch API of an event file when the actual request you made comes back, it wasn't similar for SliverPush.

And so much cool stuff could be done with that.

We have all these other technologies for kind of a hybrid or bi -directional mode of things, like not things, sorry, relationships between client and server.

We've got like WebSockets or WebRTC data channel. I've got this new thing called WebTransport that's being spec'd up in the ITF and W3C.

But for me, none of those provide actual HTTP semantics.

I wanted to be able to have something pushed to me with metadata that is standardized, that describes the content type, and that I can actually reason about because somebody else has done all the hard thinking.

And in a previous life, I spent a lot of time thinking about this and wrote a moaning white paper to talk about a cool use case we were using to deliver video.

So if anyone's interested, you can read that. It's nothing to do with web performance per se, but sometimes if we're able to give people tools that they can use them, we'll develop good experiences on them.

And I think maybe some of the dislike of SliverPush of it doesn't do everything it could in terms of performance would have been slightly offset if it was like, well, don't use it for performance here.

You can build this kind of application. And it's specifically for the web platform because if you have some H2 library, they typically have just a callback function you can register with a few lines of code that would allow you to handle those things.

And people are building cool demos outside the browsers, but that's the world we live in.

And so we kind of already covered this point, doing the simple thing can hurt us.

How you command a server or a CDN edge to push for you is something that also wasn't really kind of standardized anywhere.

There was this notion that we can reuse a rel preload link parameter or attribute.

I always forget the term. So when you're responding to that initial request that the client made, my edge server can see this parameter and then push that file.

That doesn't give you the filling of the dead air time that we just talked about before, because I still have to wait for the server to generate the header, which quite often means they want to generate the full response and then provide the headers and data out in one go.

And so if you do things too simply, yes, it will work, but you end up kind of pushing content the client might already have and wasting that bandwidth across other stuff that it might need.

So some clever people came up with a way to mitigate the case where a dumb server would push stuff that the client already has or cache digests.

I don't know if you remember that one.

Yeah, I remember that one. Yeah. So essentially, like you pointed out, H2 push has a number of potential problems.

One of them is over pushing.

So pushing resources that the browser already has, the client already has, and this is the problem that cache digest solves.

So essentially, the browser sends a list at the beginning of the connection, the browser sends a condensed list of all the resources that it has in the cache for that particular origin, and then the server knows what not to push as a result of receiving that condensed list of resources.

But that is not the only potential problem with H2 push. So there's the problem of over pushing, there's the problems of H2 priorities, where we have definitely seen cases where pushing critical, like when the server receives a request for the HTML, it starts to push resources that are critical, but at the same time, less critical than the HTML itself.

Then when the HTML arrives from the origin server, the fact that buffering happens, both in the server itself and in lower layers, mean that push is delaying those cases and delaying the arrival of the HTML, which is the most important resource.

So there was a problem with H2 priorities, and like you pointed out, there's also the problem of pushing the wrong thing.

So if the developer added the link rel preload for a resource that wasn't really critical, that push was potentially delaying other more critical resources.

And at the same time, it was also always pushing it, pushing those resources too late, because it was missing the point in time where push is most effective, which is before the HTML is even generated.

Yeah. So for all those reasons, and because we didn't really know what the division is, because Chrome gathered data about pushing the wild and saw that it's not very helpful.

But no one actually put in the time and work to drill down into that data and figure out which cases fall into which bucket, which cases are over pushing, which cases are pushing the wrong thing or at the wrong time, and in which cases it's just H2 priorities gone wrong.

Yeah. And it's tricky to kind of divide those pages up and find those experiments.

But I think also you end up with like competing options can come in, right?

So while, you know, CacheDigest was one way, it was a draft in the IETF. And fortunately, no one decided to implement it.

So it's there as a matter of record. And it's something that I believe, like you did get experimental data that it would help for the cases you just decided.

But unfortunately, people want to go in a different direction, and maybe just leave that there.

So like, we also had this idea of RFC 8297, which was early hints.

So this is a informative status code 103, which is an ability to kind of generate some headers related to the resource before the actual status code.

And, you know, in this example taken from that draft, you know, there's an ability to tell the browser to preload the objects before.

So I guess in this case, those objects are related to the HTML that was being requested.

And then the browser can decide if it wants to preload or if it already has that cached or whatnot.

So did, I mean, in your opinion, did preload steal a bit of Push's kind of opportunity?

Or are they different things? So in my view, they are typically used for, can be used for different things.

Preload mainly shines in cases where you have discoverability delays.

So the content doesn't necessarily lend itself to be easily processed by the browser.

So you have resources in JavaScript, resources in CSS that are potentially critical, and you want to make sure that the browser discovers them earlier.

Where Push, at least, mostly shines in the case of pushing critical resources that are typically easily discoverable by the browser.

But by pushing them earlier, you're using that time, like you're kickstarting the ingestion window earlier.

At the same time, people are now talking about early hints as an H2 Push replacement or something that can solve half of that use case.

So because early hints, at the cost of an RTT, it can still enable us to reuse that server time and perform all that computation while the browser kicks off some requests and manages to fill in the bandwidth with them.

So there is an extra RTT involved. So it's not as efficient as H2 Push. At the same time, it sidesteps the overpushing problem.

Because if a resource is in the browser's cache, it won't get fetched again.

But it's still amenable.

It can still be abused. People can still use it for the wrong things and result in slower pages, similar to preload, in a way.

Yeah. I think my memory is...

It might be wrong here, but I seem to recall in the process of this standardization that the idea of a non-final response code tripped up some implementations out there.

It's not a silver bullet. It can be tricky. Maybe it was Python or something, but things are designed around the path of most commonly deployed.

And so you see something that is completely valid, but unexpected, and implementations blow up.

They can be fixed. Yeah. It's just something to be mindful of.

And again, if you're thinking just a simple origin server and a client, sometimes these things are very easily deployable, but when considering running through a cloud edge somewhere, sometimes our systems aren't quite geared up for these things.

Although the technologies exist, actually getting widespread Internet deployment can take a bit of time to catch up.

And it needs the data to prove that it's kind of worthwhile to go to the effort of doing that.

Yeah. So first of all, ossification is definitely an issue and definitely something I'm currently struggling with on a completely different front.

But it turned out that using structured headers and structured fields and requests is something that the Internet wasn't ready for.

But let's not go on that tangent. On this front of early hints, Chrome is, so for the longest while, it was considered extremely complex to implement early hints, implement preload for early hints in the browser, because these requests would need to be browser generated, but then matched into with the renderer.

So different processes inside the browser, inside Chromium's architecture.

But recent changes made it so that it's essentially easier, not easy, but easier than it used to be.

So the Chrome loading team is now interested in trying out early hints and essentially try out with, not support the feature just yet, because that will require a lot of work, but just measure the potential benefits.

So try out with early hints enabled servers, get them to send out, send down the early hints responses and see, A, what is the difference between the 103 response and the 200 response, because this is the time that we can potentially save here.

And B, whether this whole, like, whether sending 103 is something that is web compatible and Internet compatible or will intermediaries or whatnot blow up when they see two responses.

So this is an experiment that we're currently interested in running.

So if anyone listening is interested in participating in that experiment, feel free to ping me and I'll connect you to the right people.

I learned something today. Well, not just one thing, lots of stuff.

That's cool. Yeah, thanks. Unexpected highlight. Cool. We're actually getting a bit short of time.

So we mentioned prioritization. I've got like way too many slides on that.

I was just going to like hammer through quickly just to give some people context.

I've talked about this before, but I think kind of the things that we're talking about, we can simplify it down a lot.

And just to say, like, if you have a page in your browser, that's got 10 things to load.

So here we've got five green boxes and five yellow boxes.

And what that would do is generate 10 requests, you know, headers, frames, like we already showed.

Ultimately, the browser wants to like maybe have different ways of getting those responses back.

Maybe it wants them one by one.

Maybe it wants all of them bit by bit in one go. Maybe it wants to pick a mixed approach where it wants to send the green ones first in order and then the yellow ones could all arrive bit by bit and they could be like incrementally or progressively used.

And so this is what prioritization allows us to do, which is a client along with those headers, pieces of information there, carry some information to say, look, I want to make requests.

Well, in this case, it's reverse.

But I want to make the later request depend on, no, the early request depend on the later one.

So in this case, it would load back to front, which is probably not what you want.

But H2 contains extra information that allows a client to at least express the order or the pattern that it wants things to happen.

And in H3, because we don't have this ordering, guaranteed ordering between streams, there's a feature because it mitigates head of line blocking.

You end up in this case where, well, the server receives request two and it says it depends on request one and that it should serve all of request one before two, but the server doesn't even know what request one is.

And there was all these kind of edge cases around trying to make it clear in the protocol, like we mentioned earlier, implementations, deciding to do it one way or the other leads to kind of weird inconsistencies on the Internet.

So you want to describe and cover those edge cases.

And I wrote a blog about how we are coming, the whole discussion involving many people from the community, talking to different forums, like the HP workshop about just over a year ago, and how we're in this kind of situation where we wanted to maintain H2 priorities, maybe, but not lose them at all.

Like, how can we do this stuff?

And so we're continuing that process. That blog post was written earlier in the year.

And as a community, we're discussing the different options. And I think we're kind of very happy that the model we've got, the scheme of urgency and incremental makes sense and is probably simple and just enough, but without being too constrictive.

But recently, a question of reprioritization has come up.

And I mean, this is a pretty bad diagram, but it's to say, well, look, if the browser window was smaller and it couldn't show all of those items, you would make a request for like six things initially, and it would say, I want them returned to me in some way.

And then while the server is doing all of that hard work, it scrolls the page down and suddenly a new set of requests come in.

And the server is actually going to keep trying to save the early green requests at a high priority, even though the client doesn't want them strictly, maybe as a use case.

It wouldn't want to cancel them because it's already got some of them in flight.

But what you would like to do is be able to have the yellow ones.

And this was like a use case that we had where you could insert a request afterwards, say at a higher priority than earlier.

And in H2, that wasn't well implemented by some servers.

And Pat Meaden and Andy Davis created that use case and documented it and helped get some fixed, but not all.

But reprioritization in this case would be, in my mind, more like trying to say, like, I want to change my mind after I've requested all of these things.

And so in order to, the reason why this is important to the standard stuff is because it's slightly harder to implement, I've changed my mind kind of signal, then this is what I want from the start.

And so to try and make some progress on the specification, I asked the community for some input.

And Yoav kindly designed an experiment to gather some data on this, on whether reprioritization is actually useful.

And between Yoav and Pat, who are running the experiment, I believe, there's some early data.

So we talked about, you know, whether this stuff is working or not.

Here Pat says most metrics are neutral, but the largest contentful paint degrades by roughly 6.8 on average, and 12% at the 95th percentile, and the speed index degrades.

So this was some early data. I don't know if between you, you've gathered any more, but this kind of thing is very informative to help us reason about things.

It's very easy to sit there with two different parties saying I need it and somebody else saying I don't need it.

But yeah, I don't know if you could share any more about this experiment.

Yeah, sure. So essentially, browsers are already using reprioritization heavily when it comes to images.

So browsers typically start requesting images before they know whether they are in the viewport or not.

So images are typically, at least in Chromium, requested with low priority, and then the ones that are in the viewport get upgraded into medium priority.

So that's the browser's representation of priorities, but on the wire, they get a higher weight.

And because it's already implemented, it was relatively easy to say, okay, let's kill that feature and see what happens.

So I implemented a Chromium patch. Basically, there is currently a Chromium flag that enables you to disable H2 reprioritization.

And then Pat Meenan of webpagetest fame ran a long list of servers that we know are well behaved in terms of their priorities implementation because there's a big problem with anything related to priorities is that if you're measuring it and you're just including the entire population, you're most likely to get more noise than not.

But if you pick and choose well behaving servers and then disable the feature, this is what we got on initial data.

I believe that the experiment is still running because there are a lot of servers and because it's running on basically an off time of webpagetest instances that Pat is running, it takes a while to run.

So it's still an ongoing experiment, but the initial data is super encouraging and it essentially shows what I would have thought it would show is very refreshing when it comes to data that typically surprises you.

In this case, it really just proved that initial data indicated that reprioritization is beneficial.

So yeah, in my view, we should definitely keep it as part of the protocol.

Yeah. And as the editor of the spec, I'm pretty neutral on it. I could argue either way.

Like I said before, sometimes it's about providing the capabilities for people to implement stuff.

There might be some use cases it's not useful, but if there are, then it's a trade-off between providing it and not making life too hard for people who don't want to use it.

And these kinds of things all come into play.

But seeing this data is very interesting because anything, even single digit percentages are sometimes an obvious net gain.

So to see a 6%, 12% was maybe a bit more than I was anticipating, but it's going to be super cool to see the remaining data set.

Yeah. Exciting. Before we get cut off, I'd like to thank you for your time this week, Joav.

In the past few weeks, I've missed the opportunity and it's been sad.

But no, thanks so much for coming on and talking about some of these things.

Hopefully, yeah, my slides didn't put you off too much because you're talking to somebody else's slides.

But no, it's been interesting. Was there before, well, before we close out, was there anything else you wanted to add while you have a minute or two?

No, just thanks for having me. It was fun. Right. Cool. Can I get you back in the future?

Do you think there's anything HTTP3 performance that we haven't touched on yet?

So one thing I'm working on is web bundles and delivery of web bundles in a way that can be subsetted.

So that could be an interesting future topic.

Okay. Yeah. In a way, it's cache digest, but reverse, turn it over its head.

And instead of sending what's in the cache, we're just sending a list of everything that is actually needed.

So could be an interesting future conversation.

This is the first week I finished on time. That's great. We might just sit here until they close the stream.

Thumbnail image for video "Leveling up Web Performance with HTTP/3"

Leveling up Web Performance with HTTP/3
Join Lucas Pardue and friends for in-depth explorations on using the latest web technologies to enhance performance and security!
Watch more episodes