Leveling up Web Performance with HTTP/3
Presented by: Lucas Pardue
Originally aired on January 22, 2022 @ 5:30 PM - 6:00 PM EST
Detailed tips and tricks for analysing and measuring the new HTTP/3 and QUIC protocols.
Original Airdate: September 8, 2020
English
Performance
Tutorials
Transcript (Beta)
The web, the digital frontier. I'd like to picture bundles of HTTP requests as they flow through the Internet.
What do they look like? Websites, instructions of doing stuff, bits and pieces.
I'm running out of ideas, folks. So please put some ideas on a postcard so that we can have more fun visualizations and similarities.
I kept thinking of them and I don't have any, which is why I'm stumbling.
But anyway, let's move on.
This is another episode of Leveling up Web Performance with HTTP3.
Every week I seem to forget the title of this show. I think it's just because it's a mouthful.
Maybe I should go for one word next time. So yeah, welcome back.
I'm Lucas Pardue. I'm an engineer at Cloudflare in case this is your first time watching or you've forgotten because it's been summer and you've been pickling fruit or vegetables.
Coming back after a little break, this is going to be a shorter episode compared to my normal one-hour shows.
Just half an hour just to kind of catch up, see what's happening and then depart.
So yeah, I'm considering the shape of this show in the future.
We've covered a lot of ground. It's been nearly like 12 episodes, I think, getting deep into some details about different aspects of this new transport protocol QUIC.
Focusing on maybe some of the HTTP aspects of things and touching on performance at the web level.
I think in the future we'll be able to dig into those things more as more people basically turn HTTP3 on and that we have data not just Cloudflare but across the industry.
We're working in standards and so there's many different people with different needs, different trade-offs for what performance is.
It's not just does it load fast in my browser but does this create extra work on the server and there's loads of cool community discussion on those things.
The danger of this is you always just carry on retreading ground. So my opinions on my own aren't worth much.
I encourage you to read the papers that are coming out of academic conferences or industry as well or post these kinds of things and try and form your own opinion.
The important thing with all of this is to test yourself and validate.
Some of the questions come my way on Twitter or various forms of like how does this compare to something else and the honest answer is that the best bet you have is to test it yourself.
Maybe not the answer people want, they'd like a conclusive it will behave this way but things definitely theoretically faster with QUIC and there's some good information talking about different kinds of network conditions that people have tested.
We've got some blog posts say about how we are improving congestion control.
So we had Juno on the other week talking about how changes in how we approach sending data from a server to a client and can improve the performance of protocol from that aspect but that's just in one kind of test with one control point.
In reality you have the real Internet and the last mile and all of those factors combined complicate the story which is why you might want to do a combination of synthetic testing both for capacity as well as client-side oriented tests as well as real user metrics monitoring by beacons or whatever kind of scheme you might have.
So on that note if you recall for the past few weeks aside from when Juno appeared and saved us I've been boring you to death with building a tool called H2load.
So this is a client tool a bit like Apache Bench or JMeter or any of these tools you might be familiar with that can take a URL or a list of URLs and basically hammer them and they may be not representative of how a web page loads, nothing like web page tests like we saw with Pat and Andy, but they generate busyness on the server and that helps you hone in on certain aspects.
So if you recall, let me share my screen a moment. This is live TV folks so always remember that and give me a break.
Let's just share my screen here.
If we were to run H2load which is installed on this machine, this is an Ubuntu machine so it's just apt install ng -hb2 because H2load is a tool that's part of that suite.
We can just put in a website like fireflyquick.com oops hyphen quick You can see it's going to create a client, one kind of synthetic virtual client, make it one request and connect to that URL and yeah it's pretty similar to stuff we've seen before.
I'll get into this in a few moments but you can see in this part that we didn't specify any specific application protocol so H2load read that URL and did its default behavior which is negotiate or communicate with the server and in this case it hit the Cloudflare edge and we picked HB2 and used whatever default configuration parameters of H2load for HTTP2 that there are.
This is something always to watch out for on any kind of client testing tool because there's a lot of possible options here with H2load which we're going to dig into probably a bit more on this show but I encourage you to you know read up in your own time too because I can't possibly cover them all in this time but the important ones that spring to mind that have hit me in a past life are configuration of flow control windows.
So yes you would have seen a blog post recently coming back from us about how we managed to improve our say upload speed for HTTP2 clients sending data into the Cloudflare edge and that was because of you know as well as TCP flow control windows operating.
There's a second layer of flow control which complicates things slightly and it makes a straightforward implementation of HB2 not optimal performance so you can do cleverer things there but I'm just trying to find the case here which I can't do for some reason.
That's what you get when you try and multitask. Something you know we've got some timeouts here that can be fun if you run a test against a server and it doesn't respond and your timeouts are too low you know maybe having a whole thundering herd of clients arrive at one single time causes you know a reverse proxy in one case to be blocked on an application server to generate one response that could then service everyone.
So you always want to be mindful of defaults and so you know if you wanted to emulate a thundering herd with H2 load you could just up the number of clients to a thousand oops and there we go syntax error.
So I mean what that meant was thousand clients trying to request one thing which is not really divisible because you need to request a whole client so we want to specify a total of 1,000 requests over 1,000 clients which is one request each which will happen there in real time.
This is not local host testing this is going over there my last mile network which is probably playing up today as usual the joys of homeworking.
Oh this is interesting oh no okay okay the 90% done software engineering issue um Wow that's uh that's quite interesting an unexpected failure I'll have to dig in using my powerful Wireshark tools and whatnot or I could just do something smaller just to give you a flavor of these stats at the end.
Possible there was some packet loss there and I don't know something got messed up but you know we've got status codes here so I want to make sure we're getting expected responses for the number of requests that we made.
We could simplify this down again and just run those 10 requests in one client or pretty straightforward these would have been made uh well not multi-threaded um concurrent requests so a lot quicker than making them in single single file uh I can't remember how to enforce that in h2 but there is a there is a method but anyway that's not the point I'm trying to make what we want to do in this show is test hp3 so the uh recommended way to do this is to use the npn list parameter and and this is basically old school speak for alpn identifier or identifiers when a client connects to a server you have to pick hp1 or 2 or some other application layer protocol that's running on top of tls and a client can provide a whole list of one or more things for the server and the server picks one and responds so we can say in this case we want to use hp3 draft 29 and clarify quick.com and you know if you just looked at this quickly you might think that that worked because it printed off some statistics but actually things failed um this is the important thing no protocol was negotiated um what we said is we wanted h3 29 but that thing didn't exist because there was no support um which is a bit ambiguous actually now I'll just read that out loud because it's not clear if um there's no support on the server side or the client side but in this case I know it's a client side um because this is the initial problem statement um and the reason why I wanted to try and build h2 load from scratch and chew up time so if we go quickly back to our wonderful instructions from Tatsuhiro what we need to do is build a version of h2 load that includes the ng -tcp2 and ng-hp3 libraries and so there's probably some pre-made packages out there but it's always more fun building stuff from source in my opinion so no I don't want to go through these in too much detail yet again but effectively the steps are build a version of OpenSSL with special quick support you know quick relies on TLS for the handshake but needs some additional kind of information from the library so it needs some api tweaks which aren't in any um any branch officially so um this Tatsuhiro's branch includes those things um so first build that version of OpenSSL and then build a version of ng-hp3 against it and then a version of ng-hp2 and that's where these instructions stop the step we did last week or last time was to then go into a folder of ng-hp2 and configure that to build so yes these secret instructions have documented somewhere I'm sure but this is my secret recipe so we just have a bog standard configure the enable app is not necessarily required but can help flush out any issues with finding prerequisites when building ng-hp2 and then we just make sure we have package config pointing at the basically everything we've built so far and actually this is what I did last time and the build did complete like one minute afterwards after the show finished so that was sad I didn't get to show anyone um the fruits of labor and their patience watching me um do that stuff so we can run the same command in the locally built um copy of h2 load and I believe this should work and it did so compared to the last time you can see actually in this case it did negotiate ng-hp3 draft version 29 um and it made a successful request so well hey um very interesting right um so you know we could we could enhance this slightly and run you know a bunch of tests for this one page you want to do that I could have showed you you know let's let's up the number and see we can just do a quick basic comparison of you know how long does it take to fetch you know 100 copies of that page within a single connection so from start to end this took 4.25 seconds obviously some of these stats are quite quirky if you know you think of you can't really have half of a request made but so in isolation I always look at these figures and smile but when you're comparing different benchmarks say against each other then it can help you can look at your download data rate say and compare it um which I'm going to do now um but uh yeah the when when you have something like connection reuse with hp2 some of these other statistics um are not uh say averaged over all of the requests that you made um per client say in this case we only had one client so there was one connection so this is a single measurement whereas the mean min max times for requests would have been calculated across all 100 requests um so you know it does present standard deviation in terms of like zero because there is no variance um and it's you know just mindful to keep some of these basic statistics in mind I think it's very easy to leap to assumptions and conclusions um especially if you're tired or um it's not that familiar with tools so if I want to rerun that test um let's just run say 10 clients just so we can see that change in the connect um thing you can see that request finished more quickly because um well there were less total requests oh yes that's right so um I want 100 requests per client so with hp3 um so this one finished in 11 seconds time for connect this time we've actually got a bit more standard deviation so it depends what aspects you want to test as well you know where so there's this kind of two -phase approach between handshaking and and reliable or sorry bulk data delivery once once everything's in action um so maybe you just want to really test how quick a quick handshake is um given uh comparison to tls those kinds of things and and this would let you do that um I would like a mode for h2 load that let you just focus on connection um without having to do any request because sometimes unless your request is small you end up exercising some parts of the code that you maybe don't care about for specific kinds of tests but hey ho it's um this tool is already pretty complicated already and uh it there's maybe other ways to do things you could emulate that kind of test with openssl connect say like client um scripted up but it's it's tricky if a quick because as always quick is tricky um so if we're going to rerun that test with just hp2 let's see is it going to win is it going to finish in less than 11.09 seconds uh yes it did so um yes that's sad um and and this is the case where you might want to go and actually uh say what's happening here um what is a shape of that um connection what's the flow control limits and defaults um you know here's where we'd get qlog out and maybe try and analyze where the difference is here um one reason um that there's a difference they uh let me see yes um we we haven't really talked much about um but there's there's hpack which is header compression um in hp2 and qpack which is header compression in hp3 um in hp uh so a cloud-based implementation of hp3 um at the moment we only implement static compression so this is this will explain why we're seeing say 30 percent or 30.95 savings in a space for our hp3 test um but for the hp2 test we have bigger savings um and given the page size uh i can't quite remember how big that page is but you you end up with say maybe imbalances between how large the request headers are compared to the response headers um and and or even sorry the response payload as well so some of these factors will come into play um if we go like retry that test say with a smaller um file so i have one here which is just a very small message um do that thing see how much total transferred 822 kilobytes it's pretty small um sorry same again with see there's slight more traffic there again due to um headers basically contributing stuff so it's slightly less surprising that things take longer um if the larger you can see in this case hp2 finished in eight seconds whereas hp3 finished in three so again um it's important to not draw too many conclusions from one single test even if that test is doing many requests um things will vary by time of day load all these kinds of things um so those are my pointers um and now you know how to build your own version i'd i'd i'd love you to like try stuff out and um report on it why don't you write your own blogs or do your own tv shows uh the webperf community is like pretty cool um it's one of the main reasons i got into all of this in the first place when i moved to london um there's a slack channel um called web performance i believe yes i'm just looking at uh i don't know how i got on there um i honestly i don't i installed the slack app a few months ago and it said you're part of this slack workspace um so if any of anyone's watching and they want to um join then uh yeah by all means drop me a note or or somebody one of the other famous web people that you might follow um anyway that's probably enough of that i've got eight minutes left so i spent way longer on this than i intended to um i hope i hope you found it interesting um i wanted to change tracks slightly and just talk about you know taking a step back uh someone who is a contributor to the quick standards process over the last few years i take quite a lot of things for granted i think so i just wanted to show um people who like i want to find out more information what should i do um the um you know there's if you google um these terms like quick and hp3 lots of good stuff comes up um blog posts from various people um or just analysis it's it may be harder to find some of the academic papers i've mentioned but you can do if you refine your search terms but if all of that fails you then you know we have a an official quick working group home page um which which just does a simple job of listing out things so just here in case people are not familiar with it um you know quick isn't one one single document that defines the standard okay but a family of documents um that we have well let's go through them one by one we have something called the invariance um which tries to describe um features of quick that do not change between versions so they should allow people to um say be able to uh look at quick packets in something like wireshark or or any packet transfer um as as udp datagrams flow through their network and be able to look at a quick packet and understand it um and and what the features of it are across versions um but also i think provide some guidance on like what not to do when you're taking that approach as well um oh and a funny anecdote is that the invariance have changed a few times during the overall lifetime of the document because we're iterating and we're you know finding things in quick that we started with and you know maybe assumptions are invalidated and we've improved stuff the whole process here is not just to take something there and say this is good enough but to um run it and then say this is good enough that nuance may be lost on some people uh probably a beer one day and i'll explain it um we have transport which is the main you know here's how to do a handshake here's how to exchange data on streams you know a reliable component um here's what you need to do when you detect a loss a which is then um subbed out into the loss detection recovery document we have tls which goes into the details of the handshake we've got hdp hb3 and qpac which is the header compression and we've also got these other ones called load balancers um and and extensions uh it's a bit of a weird bullet pointing here but basically uh these three documents load balances data grammar vision negotiation were adopted after the working group was formed so um you know we we can consider new work um we have a charter so we need to make sure anything that we're working on is in the scope of the work that we kind of signed up and committed to do in the first place but load balances data grammar version negotiation were uh deemed to be in scope for that um and so you know this charter was written a few years ago it's tiny you won't be able to read it i don't want to go into too much of it but it's very much based on um you know what was google quick at the time what could it do um let's describe a more general um overview of of what google quick did and what we want based on that and then we can come and actually instantiate this um in these documents here um that's not it the page also like kind of details and links to some of the other things uh that are important to the working group and for anyone that cares about standards and how they get developed i could probably talk way more about this but as an overview um we have we manage a lot of stuff in github um so as many of us are software engineers um or other people just familiar with working on github um this flow works quite well um you know a lot of people might have an account already and kind of just quickly get up to speed on the logistics and admin state of contributing or helping out um and and we can focus on the quick specific things so in this case this is a working group materials where all of the presentations have happened um over the last years like everything uh you know the agenda that we had back in february um this is all a matter of kind of a public record under the itf's note well which you should go and read if you are interested um but in addition to that you know we we have a bunch of repositories so each of those drafts i just talked about um some of them live in the base drafts the extensions drafts they live in their own repositories datagram load balances version negotiation we have the ops drafts so these are applicability and manageability documents which are also part of the core deliverables um and then you know people read read the specs they might find issues uh they can open an issue um we label these according to the different documents that they might apply to and whether say someone's find a typo or maybe a bit of text that would be improved editorially um so we can label it as such or maybe there's a issue that's a bit beefier and we would label it design um rather this one just for project tracking so if anyone's familiar with projects um managing them um most people suffice with this view or don't even look at the list of issues they just create issues and maybe some prs that can address that um and it's up to the editors to manage that process but um the chairs and the editors we spend a lot of time in this view um which is you know new issues come into our triage queue the labels are here i don't i don't want to waste time reading them out um but anyway um thanks for your time uh goodbye