Cloudflare TV

Leveling up Web Performance with HTTP/3

Presented by Lucas Pardue
Originally aired on 

Join Lucas Pardue (QUIC Working Group Co-Chair and Cloudflare engineer) for a session on how HTTP/3 is supercharging web performance.

English

Transcript (Beta)

The web, a digital frontier. I'd had a picture of bundles of HTTP requests as they flow through the Internet.

What do they look like? ASCII diagrams, bad Pong animations.

I kept dreaming of visualizations I thought I might never see, or visualizations that actually describe the thing I'm trying to understand.

And then one day I came on TV and made a ham-fisted attempt at it. Hello, everybody.

Welcome to yet another episode of Leveling up Web Performance with HTTP3.

I'm Lucas Pardue, an engineer at Cloudflare who works on protocols such as QUIC, TLS, HTTP2, HTTP3, and so forth.

This week, I do not have any guests. I thought I'd go solo.

We're entering the summer. Finding guests is trickier than normal.

And also I've extended, no, what's the word? I've used up the goodwill of a lot of people in my local pool.

So I'm looking for others that maybe have some more questions about QUIC rather than answers or viewpoints on web performance.

If any of you have any suggestions or people in mind who might wanna come on board or just fire in some questions asynchronously, please let me know.

It'd be great to have kind of different viewpoints coming in than just the usual faces that we know and like.

This week, I'm just gonna be focusing on some random stuff in a way, a bit of a mix and match.

For those that don't track the standard to all that closely, last week was the ITF 108 meeting.

This should have been held as a face-to-face meeting in Madrid, which would have been quite nice and sunny in the summer.

But due to the ongoing crisis, it was a virtual meeting. So this is the first time that the ITF had planning time to hold an all-virtual meeting.

The last one, ITF 107, was a bit of a last-minute decision.

So this time around, we could plan things a bit better.

The working groups had time to kind of look at who would be contributing in ideas, et cetera.

And rather than rely on an existing tool like Zoom or WebEx, these kind of teleconference systems that you might be familiar with, the ITF had a remote participation tool already called MeetEcho.

And MeetEcho is kind of already very good, building on top of standards technologies like WebRTC.

So it worked really well in a browser and it had integration with the chat service that we use, which is called Java, which is based on XMPP, another open standard.

It brought those two things together and it worked fine for just remote attendees, but for a situation where everyone was remote, we needed some ability to run meetings a bit better.

So an ability to queue up people who wanted to talk without talking over each other, which isn't gonna happen today because I have no guests, but you can imagine when you're talking topics that need soliciting some input or clarifications, questions come up.

If anyone can just unmute themselves at any time and talk, that's difficult.

So in the past, in the 107 meeting, we're using a very manual process of people typing in text to ask, could they be queued up?

And this tool kind of automates some of those processes.

But anyway, this is a long waffly way to say that I was quite busy last week and didn't get to prepare much ahead of time.

And with the lack of guests, well, they can't fill my time either.

So I'm just kind of looking around the Twitter sphere and other domains that I hang out in like Slack.

You know, there's some questions around handshaking, which is relevant to something else that Cloudflare announced last week in serverless week.

So I thought I'd take an opportunity to kind of pull some things together and just talk a bit about how handshaking, which are less than quick works.

This won't be an in-depth guide to those things because if I try to do that today without enough preparation time, I'll just get stuff wrong.

So I'm gonna keep it at a fairly high level, hopefully a bit introductory and to give some people an opportunity to maybe ask some questions.

It's quite hard if you're just talking into a computer screen to understand what people know or what they don't quite understand, especially when you've been working on this for so long.

There's so many areas that are taken for granted or that your brain just compartmentalizes way to say, well, my piece of code that works with this or the library that I rely on just handles that for me automatically.

So I don't need to worry about it. It's only when you have a bug or something going wrong, then you have to dig into the code, which normally takes you back to the specification.

So I've prepared some slides, which I'll now attempt to share.

If you just bear with me. Yes, leveling up web performance with HTTP 3.0.

Well, we won't talk much about HTTP 3 .0 for the first half.

Just a reminder, this is live.

So forgive any mistakes, but also do feel free to email into us at livestu.Cloudflare.tv.

If there's something I've got wrong, definitely, or if you just have a general purpose clarification question or you can at me at Twitter and you don't have to do that at the live time either.

You can just always at me.

So, TLS. Let's just drag this window slightly more.

So let's just get right into it.

If you think about a handshake, when you're in your browser and you want to connect to a server, doesn't necessarily matter what application protocol, but we typically think HTTP.

So you type in a HTTPS URL into your browser.

What happens? What's the first thing? You will make a TCP connection typically, and then you'll dive into this TLS handshake.

This is taken from RSC 8446, which is the TLS 1.3 handshake.

Oh, sorry, TLS 1.3 specification.

And this is figure one, which shows the handshake. And you can see there's a lot of kind of arrows up and down, left and right, with different signs, plus, minus, asterisks, et cetera.

And I don't want to really get into any of this detail.

It's kind of not great for a Monday evening for me. I jest, anyway. So the important thing to understand here is that the client initiates his handshake and sends a bunch of stuff, and the server can respond with a bunch of stuff.

And then the client can say it's finished.

And only at that point, when we're happy that the handshake has succeeded, does application data get sent, and that's this red box in the bottom left.

So application data is HTTP requests. So this will be creating or sending the headers or the request line that indicates the path that you want and what kind of things that are needed.

Inside the server, for the client hello, the first thing, you've got all of these extensions, and that might include something known as a server name indication, which tells the server what host you're trying to speak to.

So for example, if I was to type in https://Cloudflare.com, I would see an SNI in there of Cloudflare.com.

And that's sent in clear text to the server who can use that to look up information about the properties of the TLS session that could get created for that server.

So for instance, it would use that to fetch the certificate and other information.

Or you might say, well, you've sent a client hello into the server with a server name that isn't provided by the server.

And at that point, I could head on handshake. I could send you an alert to say that you've made an error of some kind and game over.

So this is like a complicated ASCII way of looking at things with too much information and probably not enough information depending on your take-up things.

So a nicer way to visualize this is borrowed from a recent blog post of ours.

So we have our browser on the left and Cloudflare on the right. And in this case, what we're doing is sending a client hello for example.com.

And the server hello comes back with certificate, blah, blah.

The client sends a finished message. And again, at this point, you can see we've got our red box.

We've got our application data.

So we have a HTTP request. And as you can see here, it's in the encrypted payload.

So at the point that the handshake is complete, our stuff's encrypted and protected from passive observers on the Internet.

And similarly, for the response that comes back to the right-hand side of Cloudflare, there's some lines that extend somewhere.

You know, that could be many things. As an operator of an edge service, I work on the protocols team.

So a piece of software runs and listens for these client hellos.

And we don't really do anything with the requests.

We make sure that they're valid and do some basic security and sanity checking on them.

But otherwise, we just forward them onto some other component within the system.

And then they come back to us. Sorry, that component would return a response back to us that we just forwarded on.

So there's some delay maybe in getting the handshake in the first place.

And then only after that handshake succeeds, does the request come in.

So another way to look at this is to consider the TCP layer, which I've mentioned.

And we've talked about this before on previous shows.

I don't want to retread the ground too much. But you have a TCP three-way handshake, which needs to succeed.

And at that point, TLS is running on top of TCP. TLS needs some reliability and in-order delivery guarantees.

And TCP provides them.

So again, all we're seeing is this TLS finish message. And at that point, the client can send some application data.

And another way to visualize this is something I prepared earlier, which some people dislike.

I think it's kind of fun. But anyway, it doesn't matter.

It's a visualization. There's lots of ways to kind of try and discuss things.

These are models or approximations of what's happening, depending on your understanding, your prior knowledge, and lots of different factors.

Different kind of user models can help or hinder understanding.

This one was trying to communicate a very particular problem in that if messages bouncing back between client and server get interrupted in some way, such that the client doesn't receive a response coming back, you get blocked.

You can't send that application data until it gets recovered.

And I can see the criticism for this abstraction, because actually, it's not the fact the client didn't catch a message.

It's more the fact that the packet may have got dropped in the network before anyone, sorry, without the knowledge of the client or server.

They could use TCP to detect that loss had occurred and recover those things.

But that induces more latency in the handshake.

And trying to represent that on a simplified view of left and right is difficult.

And so we come to QUIC, which is simpler yet again.

If we go back quickly to our TCP example, we've got this three-way handshake first, and then TLS data.

And you look at QUIC, it's a shorter vertical diagram.

And you can say, well, it's QUIC, and then it's application data. And the way that this is achieved is by incorporating TLS into the QUIC handshake.

And that's a really hand-wavy way of talking about it.

And I think, in my experience, it's enough for most people just to say that it happens that way.

But other people want to know more about it.

And the answer to doing that is to go and read all of the drafts related to QUIC.

So you want to read the QUIC transport or core transport document, as some people call it.

And also the QUIC TLS mapping or description of how QUIC uses TLS.

But that's pretty in-depth. And quite often with specifications is that they don't provide a nice narrative for people trying to understand at a high level what's happening or a medium, kind of intermediary level of like, I want to know more than just this simple, it works.

We establish something, and then great stuff happens.

But I don't need to know everything.

And that's where people doing their own presentations or creating some kind of more user-accessible books or literature on the topic is really helpful.

And here's just a way to talk about it.

I'd encourage other people to analyze stuff, especially as a learner or someone unfamiliar with the topic.

Typically the ones that are best able to comprehend and communicate a topic from the perspective of that intermediate or beginner.

So what is this diagram showing? Again, it's a figure one, but from a different document.

The Qwik TLS draft 29 document, which is part of the whole family of documents for Qwik.

And what this is trying to show is actually describe what TLS does.

Not what Qwik does, but what TLS does. So you can see we have two layers.

The bottom is a record layer. And so that's TLS's method for transmitting information.

Effectively, in a very simple sense. And I'm sure TLS experts in the room will tell me I'm fundamentally wrong or naive in that description.

But anyway, I don't care today. But what is the stuff on the top? We've got another red box.

We've got application data. They're brilliant. Just to the left of that, we've got alerts, which I mentioned in case the TLS handshake goes wrong, you can send an alert back either direction to communicate what that error was.

Or maybe not. Maybe you just want to say, I'm giving up. Bye for now.

And then you've got the handshake. So I don't know if this diagram is that great at describing stuff, because it talks about a handshake layer.

And then it says application data.

Maybe that does make sense if you're thinking of something like 0RTT, which is a feature I'll come on to later.

But also note that records and application data can fit into records when it's not part of the handshake, which this is just the flow.

Once you've done a single handshake, you'll go into the state of sending encrypted data back and forth until you're done with the session.

When you're using a TCP-based transport, the way to indicate that you've done is to tear the TCP connection down.

For HP 1.1, that is. For H2, there's more graceful ways of doing this stuff.

You can send go -aways or connection close. But I'll just cancel a stream if you made a request and you didn't mean to do it.

So there's better approaches to doing stuff.

But that's besides the point of TLS. So QUIC sets out in this figure 1 that TLS is being a layered protocol.

And this is how, if you're using TLS, things would be sent back and forth.

And then figure 4 describes how QUIC and TLS interact.

If we were to come up with a layer cake, which I don't have here, and talk about what is the relationship between QUIC and TLS, some people might think it's just laid on the top, like HTTP 2 was laid on the top of TLS, or it's to the side, or it uses QUIC crypto, which is untrue.

That's how Google QUIC used to do things.

But the way that IETF QUIC these days interacts with TLS is that it relies on a lot of services or features from TLS, but that it doesn't use the record layer that we had above.

It uses its own QUIC packet protection. And so this diagram doesn't show application data on that.

But part of what it's trying to do is to say that whatever your components are, however you might build your QUIC stack, whether it's a user space library or something in the kernel, that you need a component that's going to handle transmission, and reliability, and protection once keys are established.

But we're going to rely on TLS 1.3, and the way that it's defined, and its terminology and definition, and things like certificates and alerts.

And we're just going to reuse that. We'll just carry it over QUIC.

And therefore, these arrows pointing from left to right, or right to left, most of them, indicate the kind of information, the data model, say, that QUIC and TLS exchange.

And traditionally, maybe some of this information wasn't exposed in a TLS API.

And so through the course of trying to build QUIC, what we found is that TLS libraries needed to expose some new information.

Generally, that's OK.

You can just add a function, but also rework sometimes the way that functions are invoked, or provide information and callbacks, these kinds of things.

So that's a bit like H2, in that it's not just a straight, oh, I can hook this up with anything that's out there.

We have many, many different TLS library implementations.

Some of you might be familiar with OpenSSL or BoringSSL. They progress at different rates.

They have different project maintaining owners, and different release schedules, and stuff.

So even if it's quite simple to uplift an API, there can be difficulties in actually getting the correct version of that library.

If you're relying on the system library, it probably isn't quite up to date with these things.

So in the second half of this show, I might give you an example of this with OpenSSL.

But otherwise, just know that you need fairly recent versions of everything, like always, when you're working on cutting -edge protocols.

Things update quite quickly. We're talking draft 29 here. Chances are we'll be seeing a new quick draft in the near future that addresses some of the current issues that were raised in the working group last call that I talked about the other week.

But again, this diagram is probably quite accurate, and probably simplistic.

It doesn't necessarily help us understand what the heck is going on.

There's another diagram in the same draft, which shows a handshake.

We've got a client on the left, a server on the right. You send this client hello message, and you get a server hello message back, and maybe some application data, which actually I should have blanked out.

So that's a mistake. Problem with this diagram is it shows zero RTT.

I tried to hide that, because it complicates the discussion.

I forgot to do that on the right-hand side. So can I fix that in live stream?

There we go. As if by magic, you'll never see that. But ultimately, client and server declare they're finished, and then they can exchange application data.

And that's all well and good, but for somebody just thinking, well, how does it just, what are you sending?

You told me that the record layer isn't there anymore.

You told me that QUIC requires all of these things. Are they being sent on the wire?

Well, no, this is just the local interface for the TLS library.

And then there's this thing that's happening underneath. That's all local, too.

So how are the client and the server communicating? I think for some people, that's kind of the gap.

And although I tried to skim the spec for just a simple overview of this earlier, I couldn't find it.

I'm probably just looking at the wrong place because I was rushing.

But what I tend to do in these situations is just to crack out Wireshark or a tool that lets me look at what's really happening on the wire.

So if you saw any of the previous week's episodes where I had Peter Wu on, who's a core maintainer of Wireshark, we looked into different ways you could use a tool to capture or get a packet capture and then analyze and dig into things.

So I don't want to go too much into Wireshark today other than to say tools help you get a job done.

And the thing I wanted to do here is just to visualize what happened.

So in this case, all I did earlier today was point this machine at Cloudflarequic.com or some other HTTP3-enabled server, pointed a client at it, which is the Quiche client, which is a simple command line client provided as part of the Quiche library project that lets us test things in a basic way.

It's not really designed to be used web browsing or anything like that, but we use it a lot in our interoperability testing and just trying to figure out what's happening.

It has a lot of its own trace logging, but I thought I'd just look at this from a black box perspective.

So from top to bottom, this is the sequence of messages that get sent.

But rather than just a single Wireshark view, I thought I'd make it harder to understand.

So in the top box, there's two messages.

So the IP of the client was 192.168.0.50. So from the client to the server is an initial.

What's that? An initial is a type of quick packet that can be sent and contain some stuff, which I'll explain in a little bit.

So yeah, this is the message that gets sent within a UDP data ground.

And this is what's going to contain some of the information to help bootstrap a quick connection.

But we can see the response that comes back isn't like a server hello or anything like we saw on some of the previous slides.

In this case, what we see is version negotiation packet coming back.

And that's because the contents of the initial contained a special value that basically stimulated version negotiation to occur.

And that's kind of nothing to do with a simple handshake sequence diagram that we've been looking at.

And that's why I put it at the top. Because it happened. I didn't want to lie too much about the trace here.

But it's not important for the discussion on handshake here.

At least, I don't think so. So then we've got on to what is more of a classical quick handshake using TLS.

So again, from the client to the server, we send an initial packet.

And in that packet is two frames. One is a crypto frame.

And the other is a padding frame. So the crypto frame, before we get to that, quick packets contain frames.

That's kind of the message abstraction layer in a way.

That's probably the wrong term. But whatever. But packets contain frames.

We've talked in the past about you can have a packet that contains stream frames.

And those stream frames relate to different requests or responses.

That's how you do multiplexing, et cetera. In the case of the handshake, you're not yet in a state where you can exchange application data safely.

You need to establish the keys in order to protect them.

And so that happens during the course of this handshake.

But you need to be able to send the TLS messages, the client hello, the server hello, all of those things we've seen on the previous slides.

And that's where the crypto frames come in. And a crypto frame can be thought of like its own special stream.

Crypto frames carry parts of that stream. So it starts at offset 0.

And as you send data, say if you sent 100 bytes of crypto data, the offset would increment.

But if anything was lost, you would treat the crypto part like its own special stream.

So you would guarantee reliability and in -order presentation from the quick layer to TLS.

So in the quick layer, when you're receiving the packets and you're taking them apart and you're pulling out the different frames and you're trying to create this in-order buffer of stuff, once you've done that, that's the point where you can present information from QUIC, taking off the protected QUIC packet, sorry, the packet protection.

And then you can feed that into TLS.

And TLS is going to give you a response back. And then you can use that to fill in the next initial.

And in this case, what we see is the first initial coming from the client with the crypto, but also a padding frame.

And this is important because UDP is traditionally a kind of DOS vector for sending information that's unauthenticated.

It's easy to spoof and can elicit a response maybe from a server that would send something large back.

So you could send a very small packet and get a large response.

And that's kind of an amplification attack.

So padding is a way to enforce the client of sending a minimum -sized packet.

If that padding frame wasn't here, the server would detect that this is a bogus initial packet effectively and refuse to do anything with it.

It wouldn't even respond because any response to an unauthenticated client can cause bad stuff to happen.

So what do we see from the server then? Having processed these things and the server is actually happy, the server responds with its own initial packet.

So first of all, it acknowledges the first one. And then it sends its own crypto frame with its containing TLS messages inside that are of importance.

Then there's some handshake packets.

And then finally, their own special thing. I don't have the time to go into today.

But finally, the client sends another initial, some more acknowledgments and padding there.

And then finally, a handshake packet back to the server, which most likely is the finished message.

Because in this trace, after that happens, we immediately move on to protected payloads in this red box.

So this is our application data. In previous weeks, we've seen that if we enabled client logging via something like SSL key log file to dump the secret that was negotiated in the handshake, Wireshark would be able to dissect those messages.

And it could show that there were stream frames inside of the quick packets.

But I didn't do that this time. Actually, it didn't matter for understanding the handshake.

And if none of that made sense, what I can do is overlay that previous diagram of the handshake that happens here, where we've got the client on the left and the server on the right.

And the client's sending a couple of messages.

The server's responding with some. And then ultimately, the client's happy.

And we hit the red box with the application data. So that's all pretty cool.

That's just normal kind of one RTT quick and how the handshake works there.

But there's also a discussion of zero RTT, which I touched on earlier. We have a great article here written by my colleague Alessandro, who knows way more about this stuff than me.

And this is just describing how we enabled zero RTT for quick fairly recently.

I can't remember the exact date. But we enabled zero RTT for TLS on TCP a couple of years ago.

And that's been running in production for a long time. It's not enabled by default.

You do need to go into, if you have a zone on Cloudflare, just go in the dashboard and enable that.

And there's an important reason why it's not enabled by default.

And again, the example on the blog is much better than my very quick explanation here.

But what we have is this concept of replay.

So if somebody was able to observe the application data, so just going back, sorry, you can see in this case, it's sending the quick handshake along with some application data.

So this is going back actually to this sequence diagram with my hidden boxes that I'll unhide now.

So we're sending the application data alongside that initial client hello.

I'll re-hide them in case I need to reuse these slides.

And so if the application data was the container post, it would be possible to have a replay.

Unfortunately, that could happen locally by some weirdness if the packet thought it was lost and it wasn't, and it got retransmitted or something strange.

But the concern here is more somebody actively attempting to do bad stuff.

And we can think about idempotency as messages like posts that might contain a form or some payload data that an endpoint doesn't have any application layer anti-replay mechanisms inside it or just kind of simply take those things.

But it can also happen with other methods like get or other things that maybe have a payload, maybe cause something on the server to happen that isn't stateless.

And so by default, this isn't just to turn it on and you can quickly win.

Enabling 0RTT is like the thing that needs to be thought about carefully.

It's a performance optimization, but it's not a really simple, easy win.

So in order to help mitigate this risk, Cloudflare will always reject 0RTT requests that are obviously not idempotent, things like post or put.

But in the end, it's up to the application that's sitting behind Cloudflare, the thing that's on your application server, say, to decide what requests can and cannot be allowed with 0RTT.

It says here, even innocuous looking ones can have side effects that might be difficult to understand.

So it's really, you know, caveat amp to here.

But in order for the application server to detect this, we add something called the early data header, which is defined in RFC 8470.

And we add that to the requests that come into Cloudflare and that get forwarded on your server in order for you to add some logic, say that, no, don't trust 0RTT for this kind of thing, or actually this is just a request from a static website.

I don't care, it's safe to do that here. And so, because 0RTT isn't necessarily a quick win, we still go back to this original diagram I showed earlier, where we have a browser connecting to Cloudflare and doing a TLS handshake.

Can we know that TCP happens before this, or maybe it's quick, whatever.

There's still a number of interactions that need to happen before we get that HTTP request and forward it on.

So there's some kind of like dead air time, if you wanna think of it like that.

Time where Cloudflare is aware that a browser has connected and it's gonna probably talk to a server.

But obviously we're gonna do the handshake and make sure that it's happy and it's gonna finish this thing.

Might optionally have client authentication here as well, where the client could present a certificate to prove its identity, and that we can authenticate that before passing any application traffic onwards.

But in a simple sense, we've got a period of time where things happen.

And if we imagine a backend, I'm using the workers runtime here, because we talked about serverless week.

In this case, if the workers runtime is in a position where say the request is gonna hit a worker, but that worker isn't presently loaded in memory, maybe it got swapped out or whatever, because of the way that we do resource sharing.

It's unlikely that this can happen, but when it does, you end up with a period where we need to fetch that worker, the script that describes the logic or the behavior that you would like to do.

Maybe you're looking for that early data header or whatever, I don't know.

We need to fetch that ourselves, and then we need to compile it and effectively bring up that environment.

And all that can happen quite quickly.

It's kind of annoying, because we had a period of time that we were waiting for the client to do some work to respond to us.

And we figured that, well, maybe we could combine things in maybe a slight layering violation.

I don't think so. I think it's just an optimization. We have a client giving us an SNI of example.com.

So we have a fairly good signal there already that there's a request that's gonna come into that worker that's configured for example.com.

So why would we then wait for the request to come in when we already have a good signal?

So as part of the service week, there was a blog about eliminating cold starts for Cloudflare workers, where we're able to do just that.

We moved to the worker warmup that is to basically run in parallel to the TLS handshake completing.

So what this means is that if the worker wasn't there, if it needed to do a cold start, that actually it's already loaded in that time.

So the round trip times, even if we're close to the client, which we can be, because we have good distribution, we can kind of consume some of that time for the workers cold start and give what effectively is a zero cold start time by the time the actual request comes in.

Just pretty cool. This will work for both TLS or QUIC. This part is agnostic to the version of the handshake that's happening on the left -hand side because of the way that QUIC reuses TLS 1.3, which is a great kind of side effect of a good design principle and a good justification say, or moving away from Google's non-specified kind of handshake protocol that they started with and have moved away from.

So that's all that. I could probably go into it for far longer, but I don't want to.

Instead I'm gonna do something different this week and try to do a kind of live coding thingamabob.

So we talked in a few previous weeks about stuff like load testing, although we've looked at some tools of debugging.

Whenever people talk to me about HB2 or HB3 load testing, I always say it's hard.

So we're probably familiar traditionally with tools like Apache Bench or WRK or work as it might be called.

And that's pretty good at just throwing a lot of traffic at the server.

You get a whole different range of tools. There's Gatling.

There's other things that are more scriptable, but traditionally they focused on testing the HB application layer.

How much work can my application server do if I send this kind of request or if I sent posts with this body?

Fundamentally, how many requests per second can I handle?

And I kind of don't care about the messaging format.

With the move to H2 and H3, we kind of do want to test that still, but because we have things like multiplexing and currency and flow control that is occurring there, those can change the patterns of behavior that servers do.

Or you might wanna try and test the CDN that you're using to offer those new protocol features rather than just testing 1.1 because as we know, not all implementations are equal and things can vary.

And you can spot performance differences if you have a tool that enables you to test them.

So H2 Load is a tool that was developed under the ng -http2 project initially.

And I used it a lot to kind of do comparative testing of things.

So I'm not a machine that doesn't have H2 Load.

So I figured I'd just go through the process of trying to bring it on, bring it up to speed, say.

So this could go horribly wrong. I don't know if this machine will handle compilation at the same time as streaming.

So let's just see and see how far we get in the next 20 minutes.

So if you can see, I'm on Banana Slug, which is my machine of choice this week.

This is a shout out to Henry Helvetica who I had a Twitter conversation with this week and jokingly said I would get a Banana Slug's T-shirt based on some memes around QUIC and ng-http2 or TCP2 being a meme that some people in the QUIC working group found funny.

You can see by my face, I found it hilarious as well.

So yes, this is me being Vincent, not smart and suited and booted, but like I've just had to clean up a big mess.

And now this is the only T-shirt that I have.

So anyway, I'm on Banana Slug in a WSL. It's a Windows Subsystem for Linux machine because this is a Windows machine.

So let's see if I ran H2 Load, what's gonna happen?

I don't have this thing. So H2 Load is just a simple command line tool.

I should be able to type something like H2 Load, hps://lucaspardhu.com for slash delay one, but I don't have the tool.

So let's install that.

Let's put it in, install some stuff.

95% complete, like every software project in the world.

That last 5% takes longer than the prior 95%.

Wow, okay, so let's try that again.

Okay, so camera fail, it was that good.

Literally, we fell over looking at the performance. So quickly, an overview of this, it looks pretty similar to the output from work if you're familiar with that.

So what happened is the machine spawned a thread with one client and one request because we just gave it a really simple command line.

It negotiated LS 1.2, and it shows a Cypher suite that we used here and other stuff.

The application protocol was H2. I'll tell you why in a moment, and it made some requests.

So this finished in 144 milliseconds. So on average, that would be about six, seven requests per second, but we know we only made one here.

So read into that what you like, statistics. But what it shows is we have one 200 status code.

So the request to that machine was successful. And some of the timing information here, it's not super detailed, but effectively what we show is kind of entire time for a request to happen, time for the connect to happen so that the handshake beforehand, and then the time to first byte.

So as always with all of these tools, it's very good to understand exactly at what checkpoints these things are being taken and be aware that across different protocols, different versions that those things can change subtly.

And so sometimes it's useful to actually get the raw information out of these things yourself rather than rely on the statistics and just double check effectively that maybe even from the server side logs, do some initial probing to make sure that what you're seeing either side makes sense.

A common mistake when you're not paying attention is to do a load test and say that worked really well and realize that you returned like 400s or 500s for every response, like something was wrong and therefore the response is very small and got served very quickly, which is not what you intended.

But anyway, we don't care about some protocols. Maybe we were like, well, H2 is great, but I needed to get my baseline of HB 1.1 instead.

So like, how can I do that?

TLS, when it negotiates things, it uses what's called ALPN, Application Layer Protocol Negotiation.

So this H2 load client is sending a list of H2 and HB 1.1 and the server picks its favorite effectively, which is his H2 because it's better.

So we wanna test 1.1. And the way to do that is with a command line flag called NPN list, which is a name.

It stems from Next Protocol Negotiation, which was the predecessor to ALPN.

And the flag just didn't change name.

But what you can do is put the list of protocols in whatever order that you want in there.

So simplicity, I'm just gonna go for HB 1.1 and rerun that test against this endpoint.

You know, initially it looks the same, TLS protocols 1.2 and the cipher suite's the same.

Oh, I typed something wrong.

So you can see in this case, no protocol is negotiated. Yeah, there just wasn't anything that happened.

Although normally I wanted to make one request, but the handshake failed in the first instance, so there's nothing I could do.

If I was to type this properly with a slash in there, you can see that this time it worked and that it succeeded and the requests happened.

So we did one request. Wow, big wow.

But that's just a way to test different protocols. So if you wanted to force H2, if you want it to be funny, even though it would do this automatically, you can explicitly say H2.

And you would anticipate that you could do the same for HB3, right?

So you could do something like H3-29 because that's the most recent version of the protocol.

And we see that actually this version that's installed from apt does not support HB3, which is annoying.

It's not surprising, but what we require you to do in this case is to compile the source from scratch, which is probably what I'd spend the next.

I've got 13 minutes left and I doubt it's going to happen in that time.

But, you know, making one request is pretty boring. You probably want to do more complex workloads and I just want to quickly show you some other options here that you can do.

This is documented online. If you just look up H2 load, there should be a manual page that will describe all of the options, or you can just look at the help.

And it's pretty extensive. You can see this scrolling up your screen.

There's a lot of different options. You can look at things like, oh, look, there was a shortcut for 1.1, oops.

You can also do clear text communications, which is great.

You can do several URLs. So if for instance, I was to do the same thing and...

Wrong NPN list.

This is making one request.

So that didn't work as I anticipated. So if I want to make more than one request, what can I do?

I can use an N flag. This is not rocket science.

I want to make 10 requests, one URL. I just want to hit this thing as quickly as possible.

You can see it goes through. It gives a very quick progress update because these requests don't take long to service.

I got two N200s, and there's more interesting statistics here.

This kind of Instagram slash high level summary view of things.

So that's great. That's still only sending one total client and sending one request at a time in serial.

So you can see the total time to finish.

It was 600 milliseconds. So if you just want to add some multiplexing, so we're going to use H2 streams.

You could say, well, let's send 10 requests at a time.

That'd be quicker. Yes, previously it took 600 something milliseconds. Now this time it's taking 326.

So far, so easy.

You can get more clever here. You can create more clients. So let's say you want 10 clients.

If you don't increase the request number, it will split them evenly.

Sorry, it does split them evenly across clients. So if we just did this, it should make one request per client.

Yes.

So we end up with a number.

If we had to increase that number to 100, we see that there were more requests.

So I encourage people just to go and play with this with HB2 if they've never done it.

There's a whole load of features like URI files that you can define different things.

And I added a feature many years ago to allow you to kind of have a timing schedule so you could issue requests at certain points within the connection to mimic things like streaming media consumption, where you get a cadence of requests coming in every four seconds or so, that kind of thing.

But we don't care about any of that.

We want to do HB3. So the way to do HB3 is to compile from source.

And there's some instructions. So if you were to go and look up ng-tcp2 and go to the GitHub repository for this, I'll show you.

Yes. Fix memory leak.

Well done, Tatsuhiro. Good points there. But yeah, it's kind of part way down the page.

There might be Docker images for this stuff, but it's kind of a good learning process because this can go wrong.

But if it goes wrong, it's quite straightforward to fix and it just gets you familiar with things.

So the first step that you have to do is to get a customized version of OpenSSL because of the API issue.

And I'm not going to type this out by hand because I'll make a mistake.

So let's copy paste and see how quickly I can check out OpenSSL. No, not going to let me paste.

I hate the shell. Once more for luck.

And if not, I'll try and type it out. Wow.

This is going to progress quite slowly. This kind of a programmer's version of mindfulness, perhaps when you can just, in the UK here, it's coming up to 7 p.m.

It's a good day. My dinner's cooking. Time to reflect and celebrate those small miracles in life.

Okay.

I could have done this beforehand, but I figured it'd be more interesting to show you the joys that we have to go through.

The first time is pretty annoying, just installing something from Apt, but welcome to the world of needing to basically do this every time someone files a bug fix or we switch from different versions.

Wow.

There we go. So we can go into OpenSSL and how we need to look at the bugs, which I have on the second screen.

So I'm ready. Quick uses TLS 1.3, as we know, but we need to enable that in the OpenSSL library, even if we're not using it in the traditional sense, because that will enable the configure flags effectively.

Everything that we need for that. It's successfully configured.

That's great. Now we're going to make it. And here's where I would just hit something like J8 for a nice parallel build, but I'm really worried this machine will just melt down.

So just make it a wonderful single -threaded build.

Go through. Given the complexity of a SSL library, it doesn't necessarily matter if it's OpenSSL or BoringSSL.

These things do take an age to build.

And I'll be very skeptical if this completes before the end of the show. I mean, you could say it's akin to watching paint dry, but if you'd applied paint and you could see it moving on the wall, then you've probably done a bad job.

It reminds me, I actually have a decorator in who's doing decoration and they're using a spray.

They're not using brushes or roller. So the spray makes a big noise, but it does the job faster.

I don't know if there's an equivalent to OpenSSL and don't either care to investigate either.

That'd be a good time to make a coffee, especially if you're watching in a morning time from.

Yes, but make sure you make your coffee and you're back for the next segment.

I believe it's Cassian. Let me look that up.

Maybe Cassian's watching me for inspiration. So we're getting close to the top of the hour.

I don't want to lose the opportunity to say to people, please let me know if you'd like me to continue this experiment.

OpenSSL may have finished compilation by next week's episode.

I don't quite know. And if not, well, we can still carry on watching it.

But in seriousness, I think it can be interesting to step through these things.

I was very tempted to do this during the day and have it, a bloopy of his one I made earlier.

But I think that glosses over the problem of just reading readmes and then hitting issues and not knowing how to debug them.

So I thought if I can go through this as a step-by-step process with people, they might just see that even the experts in the room still have to go through the basics of these things.

Yeah, we're on B.

So, and to close out my point, if you don't want to see this anymore, that is absolutely fine.

You can stop viewing, but you could also let me know what other things you would be interested in hearing about OpenSSL.

No, about HB3 and web performance.

So I'm hoping by the next show, we can actually have this built and maybe just do some comparative load tests.

Nothing too crazy, but just to say, if you try and saturate a link on the public Internet, are there going to be any huge differences between H2 and H3 or H1?

And then bear in mind that, implementations, both client and server, are still new.

We're still figuring out quirks and bugs and stuff like that.

I have seen cases in the past where requests that did complete didn't get marked as so.

These things just happen. So it's only by trying stuff out and reporting the issues or trying to fix them do things get better.

None of this is just going to magically work.

So I do encourage you to try. If you can do this with multithreading, it'll be quicker.

So join me next week, if you can. Bye for now.

Thumbnail image for video "Leveling up Web Performance with HTTP/3"

Leveling up Web Performance with HTTP/3
Join Lucas Pardue and friends for in-depth explorations on using the latest web technologies to enhance performance and security!
Watch more episodes