Cloudflare TV

Leveling up Web Performance with HTTP/3

Presented by: Lucas Pardue, Robin Marx
Originally aired on July 13, 2020 @ 7:30 AM - 8:30 AM EDT

Detailed tips and tricks for analysing and measuring the new HTTP/3 and QUIC protocols. Featuring guest Robin Marx.

Episode #3

English
Performance
Tutorials

Transcript (Beta)

The web, a digital frontier. I tried to picture clusters of HTTP requests as they flow through the Internet.

What do they look like? Sequence diagrams? Bar charts? I kept dreaming of a visualization tool I thought I'd never see.

And then one day, someone made one.

So, hello, everybody. This is the third episode of Leveling up Web Performance with HTTP3.

On the show today is something different. I have a special guest, Robin Marx, who I'll introduce formally in a moment.

You might notice some changes here.

I've had to relocate to the sofa. As always, real estate is a precious resource, so I've been relegated.

But less about me and more about my guest.

So, this week, as mentioned, is Robin Marx. And Robin is a web performance researcher at Hasselt University in Belgium.

He's mainly been looking at HTTP2, QUIC, and HTTP3, and he creates tools that help him and others debug their behavior and performance.

In a previous life, he was a multiplayer game programmer and co-founder of Lugo Studios.

YouTube videos of Robin are either humoristic technical talks or him hitting other people with a longsword.

Originally, I wanted to talk about some HTTP3 and QUIC stuff, but actually, now, Robin, I think we all want to hear about longswords.

So, I know you have some slides and some other stuff ready, but yeah, throw that all away, and it is with you.

Yeah, let's promote the sport.

So, it's called HEMA. It's Historical European Martial Arts. It's the ideas that we have, like these old manuscripts from the medieval times, where they describe sword fighting techniques.

It's not just longsword, it's also the sword and shield, and the rapier, and that kind of stuff.

And they've been developing that into an actual sport.

So, you have the people that do reenactment, that's more like with the dress up, and they do big fights.

But we actually do it like, it's more like Olympic fencing.

Also, with like the same kind of mesh mask, more production, and with different weapons.

Brilliant. Yeah, that's great. We won't spend the whole hour on that.

I'm sure it's very interesting, but I did want to pull in, I did want to do a bait and switch on people.

I think the people who would have tuned in here have taken a very clever decision not to watch the Worldwide Developer Conference keynote speech that's happening right now, but to join us instead.

And we're going to provide a more interactive and fun experience for them. So, while you're watching us talk, and learn some cool stuff, debugging tools, etc.

If you want to ask us any questions, you can email in livestudio at cloudfly .tv.

Or, as Robin's presenting right now, we're active in the Twittersphere. So, we'll be monitoring questions and comments via both of our Twitter handles here, similar to me and Programming Artists Robin.

And there's also a link to the uviz tool at Hasselt University, who kindly hosts that.

Great. So, you talked a bit about your hobbies, but just before we get into some of the technical stuff, I always like to try and figure out why people got into what I call it a bit like Internet plumbing.

There's a lot of different aspects to web performance, but some of them focus more on the kind of wire protocol level of things, looking at TCP, looking at H2 framing, and kind of the bit in the middle of the sandwich of the protocol stacks, and the cakes, and things like to present them.

So, can you share a bit about your history, how you got into becoming a web performance researcher?

Yeah.

So, I said I used to be a game developer, but then I had to move, and that was close to my old university.

And I went there looking for a job, and they were just starting a project on web performance, which I'd always found interesting, but never really got into.

And the whole setup of the project was, you know, HTTP2 is coming, right?

This was 2015, 2016. It's coming, and it's going to be so much faster.

It's going to make everything 200% faster, and we're going to use that to do the project.

Yeah. And that was good, you know? So, I joined. Only, as you know, you quickly find that it's really not that much faster, and it's actually quite easy to make it slower than HTTP1.

I've always wondered, because Robin and I have had the luxury of being at some ITFs and stuff, but yeah, actually, I've never realized that you weren't using H2, in this case, for like a website, right?

You were using it for something else?

The whole idea was actually to optimize video streaming at that time.

It was like Netflix was coming up, and then the whole project was about, you know, what is the technology to do that, and also live streaming, and then to support that using HTTP2, which we ended up not doing, because it's difficult.

It took like a year to figure that out, all of that, you know, why isn't it working, and why isn't it faster, and what are the pitfalls?

Because back then, all the blog posts and all the talks and all, they were very, you know, enthusiastic about H2, and they were saying it's going to be so much better, but it turns out that there are so many different things that can go wrong, and that's one of the reasons I want to do the whole quick debuggability and tooling stuff, is to try and prevent this this time around for H3.

Like, H3 is great, it's going to be faster, but it's not going to be, you know, a complete difference, and there are still going to be problems there, and I really want to make that clear in my talks and in my work.

Yeah, and I think it is a good observation, in like the fullness of time, for us to be able to go back and say, you know, it was exciting H2, it was like this revolutionary change, and had so much potential.

It did focus on maintaining effective compatibility with HB 1.1 without changing too much, but it did introduce different things, and they maybe, the effect of that wasn't quite what people were expecting.

There was like a lot of knock-on side effects, even things like people, for instance, having a server that depended on the kind of rate limiting that HB 1.1 serial requests relied on, and as soon as they turned on parallelism, then the servers crashed.

Those kinds of things, and like, yes, at the time there was some test suites, but you can think of those like unit tests.

In actuality, it's the deployments are like integration, and I'd say we're still discovering, you know, unintended consequences of design decisions made with H2, even today with some of the priorities work that, you know, you and I have been involved in for the last year, so yeah, it's having additional tooling around this to help even, it might even prove stuff, but design experiments to then extend forward is really cool, but yeah, I think personally I find this kind of stuff fascinating, but I don't know about others, but yeah, so you talked a bit about...

I assumed the viewers do too, otherwise they wouldn't be here, right?

Well, maybe they just like my lovely British accent, but anyway, we're going to talk a bit about debugability, and so, you know, there's a few ways to do that.

Why don't you kick us off and show us one of the possible...

My screen is shared, right? So you can see, so I'm going to assume the viewers have a little bit of experience with Wireshark and TCP, and I'm going to explain most of the things real quick.

So typically what you do if you want to look at what's happening on the network, you fire up Wireshark, which is what you're seeing here, and you take a trace.

So basically, you pick yourself between the client and the server, and you take whatever is going over the network, and you save that in a big binary file, right?

That's a packet capture, and then you get the keys, so the decryption keys from either the client or the server.

That's important because QUIC is fully encrypted.

You can't really do much without the keys, and if you have both those things, so the binary blob and then the keys, then you can decrypt, and then you get Wireshark, and Wireshark is really cool.

In that fashion, it shows you all the different packets that go over the wire, and if you click on the separate packets, you'll see the contents and all the details that you can see here, right?

I assume most people are familiar with that, even if it was just from university or college, if you've ever done that.

You've got a QUIC trace, but whether it's TLS and HP or QUIC, the basic premise is like the same, right?

Somewhere in the network, even if it's the same machine, you're capturing packets, but you're not part of the TLS or QUIC connection.

You're kind of an outside observer.

Yeah, you're doing passive measurements, as it's called, right?

If you want to use this to debug what's going on, you can, of course. All the data is here, but it's kind of clunky.

It's not the most user-friendly way, at least in my perspective, right?

So that's kind of what we started with, let's say, three years ago with HP2 as well.

This is what we had, and it didn't work all that well for us.

So what we said, how about we make like a custom visualization for this kind of stuff, right?

That's what I'm going to try and show now. So this is Cubase.

So it's a visualization tool suite. The whole idea is that if you take one of those packet capture files, you can upload them here or you can load them by URL.

You load them in here and then you can visualize them. So it's basically the same as Wireshark, but a different visualization.

So this is the exact same trace.

It's what I showed before. If you don't believe me, look here. You have the initial packet with the cryptography frame, and then you have an ACWID padding, CryptoID padding, and that's exactly the same as what we had here, right?

So we have an initial with the Crypto, ACWID padding, CryptoID padding.

So it's exactly the same information, but presented slightly differently.

And we kind of color different types of frames in a different way to make it a bit easier.

So say if you're looking for how acknowledgements happen in QUIC, then you just need to look at the green stuff.

If you want to look at, if you want to know what's happening with the data, you need to look at the blue stuff, right?

That's already easier. And then we also add this.

This is one of the things that I find most irritating in Wireshark, is to know in which direction is the packet traveling, right?

In Wireshark, you have to look at the port or the IPs, and here we just have arrows in different colors going different ways.

And so in this case, what we're showing is like the very first interaction between a client and a server, right?

So the client's on the left-hand side, and the server's on the right-hand side.

And so, you know, there's a lot of details around, we've talked about this before.

Well, so I've talked about this before in sessions of, you've got effectively that blue line at the top is a single QUIC packet that contains QUIC frames in this case.

Yeah, an initial packet that contains a crypto frame.

So the, I think at least for me, it's Wireshark kind of assumes you're very familiar with those things, and that you'd know that you need to dig down and expand out those aspects.

Whereas here, it's kind of a different view in that it just shows you what happened at the high level.

And like you just did, you can click, can you do it again?

Click in onto any one of these packets.

And this is your QLOG format, right? Yeah, I'll get to that. Yeah.

So that's the, that's the thing. So this works better for me. This is faster for me than going through Wireshark trace.

But there are still some problems with this.

It doesn't show me everything. It only shows me what goes over the network, right?

Well, in QUIC especially, you have a lot of things going on on the endpoints in the client and server that you don't see on network.

I've got an example of this here.

Let me see which one here. This is a new trace. And you can see on the left side here, suddenly you have a lot more stuff going on that you didn't have for the, for the Wireshark trace.

And for example, this one here is interesting. It doesn't really matter what's here.

This, this is all has to do with congestion control variables.

We'll talk about that in a minute. But the thing is, all these variables are not sent over the wire.

These are things the client knows, the server knows, but you don't see this on the wire.

And so if you only work with Wireshark traces, you miss this kind of information, which is fine for a lot of things, but not ideal if you're debugging or initially implementing all these things, right?

So to really debug this stuff, you want to have a more in-depth insight.

And that's kind of why we, our proposal, which is a QLOG format is like, let's say, instead of taking a trace in the network, let's store this state on the client or the server or both directly, right?

And it's, might sound complex, but it's really simple.

As you can see here, QLOG is just JSON. That's it. It's that simple.

You just have JSON file servers, and you can just say, I have a packet sent event.

And what was in the packet? Oh, all of this stuff, these kind of frames. And then I have a packet received events.

And this is what the packet looks like, right? That's all the things you get from Wireshark as well.

But then you have the other things like here, the metric update events.

This is the kind of stuff that you would miss from the network, but that you do get if the client or the server gives you that directly.

Yeah. And so before we start. An aspect of this, I guess, is that it's, it's very, these kinds of parameters are like really important in coming up with the decision making, but they're, especially for stuff like congestion control, they're sender driven.

So those parameters are important to figure out why something happened, but they, then there's no ability to express them on the wire.

So that's good and bad in different ways.

It's not even that the fact that the traffic was encrypted and a passive observer couldn't see those parameters as they were sent, but they were encrypted.

They're just, they're internal details. And sometimes, like I think you presented before, some implementations were logging this somewhere, some were logging pieces of information, but in their own different formats.

And that makes somebody like, like yourself, a researcher in this area, trying to understand why somebody made a decision.

It kind of makes life a bit difficult for you.

That's just what I wanted to say. Before we did this, everybody was logging this like in the command line, right?

Just command line, debug, output, but they're all different formats.

And one of my students, he, I think he spent two months writing regular expression parsers for four different implementations, trying to get out of this stuff.

And that's, it's that frustration that led us to, to propose, you know, how about everybody just outputs JSON and, and, you know, it's going to be much easier.

This leads nicely into our first question, which is via email from, I'm going to get the name wrong.

So it's Joris Herbots, maybe, I don't know.

He says, can you elaborate why you went with JSON for the Qlog format?

Doesn't this prohibit the ability to, for example, do live debugging?

That's an interesting follow up.

So JSON, because several reasons, it's human readable, but you don't need the tool.

You can just open it up. You can grab it. You can just go through with your normal tools.

That, the second thing is it's, it's default supporting all the browsers, right?

All the browsers can load JSON directly into JavaScript.

You don't need to, to do the translation step. I also completely disagree with the whole, you can't do live debugging stuff.

Like, yes, it's true.

In the normal JSON file, you need to, you need to have the end tags where it won't parse, right?

But if you're doing live debugging, you can send, because each Qlog event is self-contained as a JSON array, which you can see here.

So you can just send these arrays one by one, and they don't need a streaming JSON parser.

I agree with that, but those exist, and those are also in QVIS as well. So you can do that.

And there's been a lot of discussion, is JSON the correct format? Because it's quite verbose, and can't we just move to something more binary, binary encoded, something like protocol buffers?

And if people are interested in that, there are many discussions on the, on the GitHub for the format there, and in our papers, and the end result is that, that many people still want to stick with JSON, but that we do have options to compress it and to move to C-BOR, so the concise binary object representation, which is compatible with JSON, to make it lighter.

And, you know, I, disclosure, I've implemented Qlog in Cloudflare's quick implementation cache, so I've both added the ability to, at certain points in, in the code where things happen, like a congestion control event, to create a Qlog event as well, and log it like that.

We're already doing our own logging, but then to write a serializer, so that it could take that event in an object form and write it out in JSON, and that went through seven rounds of iteration.

Initially, it was the simplest option, is to keep memory and buffer everything, because I was only doing short test runs between clients and servers on localhost.

So that kind of worked okay, but then as soon as you start to apply this on the Internet, you're going to get a lot more of events happening, like packet losses, retransmission, you're going to do more requests, so, like, in the community, like, this, we've had a lot of those discussions, right, and you've heard from lots of different people about the concern maybe that they might have with logging for paucity and stuff, but sometimes it's about picking the right tool for the right job, like, we don't run PCAP tracing 100% of the time for all traffic.

You just, it's just not scalable.

There's kind of request-level logging that people might put stuff in, like TCP info, if they cared about this, but that's a snapshot across single link in a connection.

This fits a different kind of need, and so, yeah, it'd be cool to live debug it, but I think a lot of the value, personally, I've had out of the tool and the methodology is, you know, from interop, somebody trying something out and saying, this didn't work, here's my client-side log, and you go look at it, that level of stuff.

I'll stop interrupting you and let you continue.

It's great. It's a great segue into the next step, because that, so, the Qlog allows us to give internal state, right, but it also allows us to do is to now have two perspectives.

You have the client and the server separately, like you say, and if you have the client and you see something, you can say, here's the client-side trace, and if the server has a server-side trace, then you can get this kind of stuff.

Now I need to select the correct one. There we go, and now suddenly things change, because now I know exactly the correct time steps, so the previous one, everything was a straight line, because I only knew from one perspective.

That's also what you have in the network, but now we have client and server, so, again, client on the left, server on the right, and now suddenly I know the exact RTT.

If the clocks are aligned, obviously, but I know the exact RTT, I can see what happens, but I can also do this here.

These Xs indicate packet loss, because I know the server sent this, the client did not receive that, I don't have the opposing packet-received event, so I know it actually did in the right, and I can actually show packet loss here, but I can also do, this is, I like this one, I can show reordering on the network, right, so there's this very slanted line stream, zero thin, that was sent before the CryptoMJ2, but they cross in the middle, which means that they were reordered, so it just gives a very visual approach to seeing what these kind of issues were in these traces, which is very difficult to do from a single network-level trace without manually checking timestamps and that kind of stuff.

Also, a nice thing is that we can hear, we have more information about why packets are dropped, and why packets are declared lost, and why packets are retransmitted, which is what you can see here, these are lost timers that are being set by the application.

Again, this is all internal state. You don't see that on the wire, because we have these two ends now, you can really go like super deep and find issues, and this is being used by many people in the Quake community, especially to debug the handshake, because this is, this trace is not normal behavior, this is heavy packet loss introduced in the simulation to see if it's robust, right, do you eventually continue with the connection or not, and there were so many issues with that, because it's very complex, and just visualizing it like this really helped getting to the bottom of those issues, right, and that was the bad trace, so I also have an example of a more normal trace.

It's a normal setup with less loss, and what you will see here is, you'll see much, the lines become multiples, because the congestion window is growing, you're sending more and more traffic, and that's when you see the limits of this visualization coming in.

It becomes too slanted, and there's too much, and it becomes, I keep planning to add in like a filtering option that you can say, oh, you show me these kind of events or condense it so that I can use it.

But also all of this is happening in the web browser, right, so you've got some constraints, but actually it's really cool that, you know, I can, like I do quite often, just load this up on nearly any device I've got, and get like instant gratification from the images without having to load up like some expensive tool, but other people could build tooling around this if they wanted.

It's just, you can build the best ones. That's actually a major point for JSON as well.

I want this to be web-based, and I want people to install stuff.

I just want, you know, go to this URL, if you have the file, upload it, and you can go.

And so, like you showed, some of the lines interacting, there's a lot of stuff on the sides.

Can you explain any of those, like, quickly?

Yeah, so again, this is a bit more realistic, so this is, again, the whole congestion control stuff, where the congestion window is updating in the bytes and flight, which we'll see in a moment.

This is the spin bit, which is a very, controversial bit in the quick protocol, showing if it's toggling or not.

Yeah, you should filter that one.

So, there are many other different, very specific events that you can see here that are going on.

So, I just had a couple of examples of errors that we found using this tool.

So, let me see. Yeah, this one.

So, interestingly, QUIC has zero RTT connection setup, as you know, so you can basically send with the first flight already a request for a packet, and that's what you see here.

So, the first one is opening the connection, and then you already have what they call a zero RTT packet, and what that contains here is a HTTP3 frame, which you can see here with headers.

What am I getting? I'm getting a file of one megabyte, right?

That's what I'm requesting from the server, so that's also something that you can easily see here in QVS.

That's what's being sent, and what you expect the server to do is reply with already some data for that file, right, and that's what we see here.

So, the server replies, and we have about three packets for the file.

Why just three packets? You know, those three packets don't contain everything.

That's because of an amplification attack that you could do.

I don't want to go too much into details, but if the server sends too much in response to zero RTT, it could be used in a distributed denial service attack, right?

So, what the quick spec says, if you get zero RTT, you can only reply with about three times as much data as you got, which is kind of what you're seeing here.

It's not exactly three times, but we have two packets, and then we have about six packets going out, right?

So, about three times amplification factor. In a normal connection, you would continue then with more.

I stopped it right here. So, this is normal.

So, let's look at what's not normal. This is exactly the same setup.

Get the request. It sends three packets, four packets, five, six, seven, and it just keeps on going, right?

It just keeps on going, and this is obviously a bug, right, where the server in question here simply did not check this three-time limit for zero RTT requests, and they just sent as much as they could.

They have a very high initial congestion window of 40 kilobytes, and so they just kept sending a lot of packets, and so just by kind of knowing what you want to see, I want to see a very short trace, and then you just visually see something is wrong with this kind of tool, which I think is kind of cool, and then the last one is the coolest bug I found with this tool.

In this trace, I'm requesting a thousand small files.

A thousand. This is all H3, yeah. So, here we can see all of these thousand requests are being sent in individual packets.

That's already bad. That's something the client shouldn't be doing.

The client should be bundling them up into one big packet with all these different effects.

It doesn't do that, so that's very clearly visible here because I have arrows going out for each of these.

That's already bad.

What happens then if you scroll down because server doesn't allow you to request a thousand things at one time?

That's called flow control, so the client notices this, and it says, hey, I can't request files 300 to 1,000, and it sends what's called a stream-blocked frame and quick, which is a way for the client to say, hey, server, I want to request more, but I can't.

Please give me some more flow control allowance.

Again, it should be sending this like once, maybe twice.

It just sends this for each individual request it can't do. You can see it's bad.

It sends the same limit as well. Yeah, exactly. So, it just keeps on sending.

This is bad. You can see that. What server does is so much better.

These are all the thousand requests. Let me scroll down because this takes a while.

We're still going.

Then the server starts. This is all being done in one round-trip time, massive back and forth.

Then the server starts replying. This is what you more expect to see here on the left side.

This basically means it was one packet with a lot of different HTTP3 data into one packet, right?

We don't see the arrows, but we do see that HTTP3 frames are being parsed, and, if you look, this was, for example, a header frame that was coming in.

So, that's normal, and that goes on for a while, and then the server starts noticing all these streams blocked frames, right?

And it appears to reply to each of those individually is what we're seeing now.

So, the client sends all the ones, and then the server starts replying to each one individually.

So, the client was doing bad things, but the server is just as bad.

This goes on for quite a while, and I have to scroll down to packet number 500, because it just keeps on going.

Let me see 500. Yes, this is the mega packet of death, right?

This one here in the middle. It has 287 frames and 41 stream frames, so this is the flow control updates, and this is the actual data.

If you drill down, you will see that all this update is for the exact same value, right?

It's all saying you can only send 200 streams, which I knew, because, you know, I said to you I could only send 200 streams, and it just keeps on sending this like 200 times in the same packet, which was fun.

I like to find this because this was to a live YouTube server, which is fun.

I don't want to cast shade on Google. They're doing great work, but it's fun for me as a lovely researcher from Belgium to find this kind of weird bug in like a top company implementation and expose this to them and say, you know, maybe you can fix this.

But, you know, I could speculate it's potentially an easy fix, but, you know, the code to do that kind of thing is fairly straightforward.

For anyone not familiar with QUIC, it has constraints on streams, on how many you can open at a time.

H2 had this too. It was like max concurrent streams, but with QUIC, it's a lot more explicit that you have an initial number you can open, and then as you use them up, the other side can give you more.

As Robin said, it's like flow control, and so one strategy is to say, well, as each one completes, as each request completes, I'm going to increase the amount by one, for instance, and so, yeah, you can write some fairly simple code for this, but then you get these weird effects where you can't anticipate weird things the other side could do, and you're both doing completely legal QUIC stuff here.

It's easy to catch the obvious wrong stuff that's protocol breaking, because one side with the connection closed, probably, but also these kinds of things where you're sending lots of frames for areas of, like, denial of service as well, so although the spec might not say, I don't know in this case, probably not, but the spec might say, oh, you should be mindful of things, but, yeah, you might find in practice as we get more deployments that this kind thing might elicit, like, enhance your calm kind of response from the server or the client.

I don't know. Both sides are probably annoyed at each other in this case.

Exactly. That's a very good point you make.

This is valid QUIC, right? They're not breaking any protocol rules or anything.

You can't do this. It's just not efficient, and both sides should be prepared for what the other side does.

All right. So, yeah, that was all I had for the sequence diagram.

Unless we have questions, I would move on to the congested.

I've got one question, but I'm going to save it towards the end, I think, once we've seen the tool a bit more, so if you're watching peers, we will get to your question before the top of the hour.

Great. Okay. So, the next one is congested control.

Again, a difficult topic, so I'm not going to explain too much, and if you don't understand everything, it doesn't matter.

All you need to look at is how the colors evolve, and I'll try to explain that.

So, what we want to do is get a measure of how data is being sent over the network.

And we're looking here now at the blue and the green stuff.

So, the blue stuff, just as before, is the actual data going over the wire, and the green stuff, just as before, is the acknowledgements of the data, right?

So, on the left, you see that it's incremental.

The more data you send, the higher up in the y-axis you go, and the x-axis is just time, so progressing over time.

You can see we're sending data, and then it takes, so the delay in time here is the round-trip time, right?

So, the server sends you data, is what we see here, and then it takes a full round-trip for the clients to acknowledge the data that you sent.

That's what you're seeing here, and then the blocks are kind of, they're the same, they correspond to each other, that's what I wanted to say.

So, you can see visually on the y-axis, if they match up, they belong together.

That's basically the idea, and this here is then a long trace.

I think this is a download of, like, a five megabyte file or something like that.

Oh, 10 megabyte file, sorry. You can see it just keeps on going up in, like, a straight line.

That's what you want to do. So, that's basically what you would get if you would load a Wireshark trace into this thing, right?

It's useful, but it's not fantastically useful.

What you really want to see is these other lines.

The purple one is what they call the congestion window, and the yellow one or the brown one beneath is called the bite and fly.

The congestion window is basically a measure of how much data the network can handle.

If you start sending that data too quickly, you will overload the network.

You will cause packet loss, so you don't want to do that.

So, you want to grow the congestion window slightly.

This is what you see happening here. You grow it piece by piece over time until something happens.

This is the drop -off that you see here. So, this is, I don't think this is actually loss.

This is probably where we got a duplicate ACK or something like that, a different way that QUIC says, okay, something was wrong in the network.

I'm going to have to slow down. That's what you see here. You can very clearly see I need to slow down.

The congestion window drops. Again, this is simple to see, but this is something you don't get on TCP traces, right?

If you've ever done TCP debugging from Wireshark, you don't have this kind of graph, right?

Which we can now have from QLOCK.

What you expect to see is the bites and flights. So, that is the brown stuff, the green stuff, the yellow stuff here.

So, the bites and flights should be as close as possible to the congestion window.

So, yellow should be as close as possible to purple, basically.

That means you're fully utilising what you can use on the network.

You can see here the X come in, which means the previous bytes have been acknowledged.

They can go out of the bites and flight and you can start sending new stuff.

That's how the congestion control works. You quickly go up to the full limit, right?

So, if you ever see the bites and flight go over the congestion window without there having just been a loss event, then you already know I have a bug in my congestion control, which is not the case here.

So, if you are a bit of a congestion control aficionado, you will see that this is probably a new Reno congestion controller.

It just grows linearly until there's loss and then it drops the congestion window.

On the bottom here, what we have is the corresponding round-trip time measurements.

Again, if you know a bit of these things, the moment you start sending too much, you will fill up buffers in the network.

Your router buffer is going to get filled and that means that your round-trip time is also going to go higher.

This is exactly what you see here.

The faster you start transmitting data, the faster your round -trip time is going to grow.

This is exactly why we need congestion controllers like BBR, the new BBR algorithm.

It's exactly why we need that kind of stuff to prevent that kind of round-trip time growth.

So, again, with this kind of graph, you can very easily see that happening.

So, this is, now we know more or less what a normal trace looks like.

So, let's look at some bad traces. So, here, for example, yeah.

So, this one is weird. I said we want this to be as close to the purple as possible.

It stays under. That's a bit weird. You have these weird spikes here that correspond to spikes in the round-trip time, which is also a bit weird.

This is actually a trace from Google's early BBR experiments way back in the day.

I think these are about a year old. Yeah. You can see the bandwidth probing going on.

The way BBR works, it sends a little bit more bandwidth. It sends a little bit more.

If it sees the round-trip time going up, it's going to slow down again because it knows it's overloading.

There was definitely an issue here, as you can see, with the congestion window updates.

Yeah. So, that's, I like the thing that even for layman's, this is immediately visible.

If you first show them a normal trace, like we just did, and then you show them this, they immediately see this is weird.

This is different. Same, I hope, will be what you see here. Again, this is not, this doesn't look like what we saw before, right?

The congestion window grows, but our byte and flight is, I'm sorry.

And, yeah, just to pick up on the point you just made, you know, you can explain, say, here's an RFC for a thing or something new we're doing.

We've got some code.

You know, this is how it behaves elsewhere. It should be better than Reno or whatever in this case, but how do I check that?

Like, if it's manually grucking through a wide shot trying to look at the fill level at some point, you can do it, but it's tedious and it's time -consuming.

So tooling like this can help, especially when you can feed this to maybe people who aren't like deep in the code itself, and they can just say, oh, I've been doing experimentation.

This is what I've seen.

It doesn't look right. Can you take it and back to me? Tell me if I'm wrong or whatever.

Oh, it's exactly, exactly correct. That's one of the key things I wanted to do is to democratize QUIC and all the other stuff, make it more easy for even students.

Like, I work a lot with bachelor and master students to help them understand this better.

So this trace is very weird.

Why? Because here we're not congestion window limited. We never even reached a congestion window, so something else must be going on.

And if we zoom in, it's going to become clear quite directly.

It seems that we are aligned with this pink line, right?

So every time the pink line goes up, I also can send a bit more data.

What is the pink line? If you look here, it's again the stream flow control.

So this is the client in this case saying, you can't send me too much data at the same time.

You can only send this much, which is the limit that you see here.

And only if I up that limit, you can send me more. So in this case, your network allows you to send a lot more, but the flow control from the client says, no, no, this is the limit, and it's very slow to update.

And so you can see this happening.

And the way I found this, and this is what most researchers do, they run high-level tests.

They just run, okay, I'm going to download a 10 megabyte file and see how fast it was, right?

I get like five megabits per second download. And here I had a lot of different downloads, and this one was like way too slow.

It was like half as slow as all the rest.

And so that looks weird. And then I could just take the QLOG, load it up here, and see why is it being that slower?

Well, most of the time, most other researchers with TCP researchers, they can't do that because they don't have this kind of in-depth insight.

And they have to speculate. They have to guess.

It's probably flow control related, but we're not sure. With this kind of thing, you immediately have proof and you can really tweak things.

And you know, we're talking about QUIC now, but if you think back to HTTP2, it's got two levels of flow control, both TCP connection flow control and H2 level stream and connection flow control.

And you end up with some weird interactions there.

Like you say, the manifestation is, oh, that wasn't as fast as it could have been.

But it's like, well, we're talking theoreticals now. How fast do you think it should be?

And it's very easy to end up in a tete-a-tete with like, no, it's working fine.

So yeah. I mean, having this kind of tool is cool. To prove something is going on.

The last one is very similar. Let me see here. This was a fun one because it's very clear.

Here at the start, you can see that too. It's like, we send a lot, congestion window is growing, and then suddenly we do a full stop.

We can't send anything until about, let's say, 800 milliseconds here.

And this turned out to be the client opened the file descriptor to write away the file it was downloading.

And for some reason, the file descriptor opening was being really, really slow.

It was blocking the client process for like 500 milliseconds, which delayed the flow control update.

And so you see, everything is okay. The server was fine, but it was a client.

So the nice thing is this kind of bug is immediately visually obvious to anyone.

Even to me, I'm not a congestion control expert, right? I know some of these concepts.

But even to me, I can do a lot with this kind of tool. And like you just said, it's not a bug in the flow controller.

It's a combination of effects within the implementations, which is probably based on the system that's running on at that time and transient issues even.

It was a missing sleep somewhere from the debugging session that we forgot.

So we don't have that much time.

So I'm going to continue to the next one, which is your favorite, which is the multiplexing diagram.

And I don't have too many traces for this because you already touched on this.

So the basic idea is that just like for HTTP 2, HTTP 3, you can download multiple files at the same time.

So each color here and each bar here in the waterfall is a different file.

It's basically what you need to know.

Each color is a file. And then on the x-axis, you see how these files are being sent.

So how to interpret this, we have the first is the big pink block, which means that the pink file is being sent.

A big portion of that is being sent first, right?

That's the way you need to interpret that. You can see that here it's using what is called a sequential scheduler.

So it tries to send each file in its entirety before going to the next file.

The weird thing, though, is that here, so the files are requested top to bottom.

So yellow is first and then purple and then orange.

And the server is actually sending them back in the opposite order, right?

So the pink one was the last file requested and it sends it first.

So that's a loss in first out, which is what you really don't want for web performance.

And again, this visualization makes that obvious immediately.

Another thing what this visualization does is show you where retransmits happen.

And for that, you need to look at the black bars on the bottom here.

So if you have black on the bottom, everything that's above there is a retransmitted packet.

It's very easy to see here. So we first sent a lot of pink. A lot of that was lost probably.

Started sending green. Then it noticed, oh, my God, I lost a lot of pink.

So I need to interrupt the green and start sending the pink stuff instead because that was apparently more important.

So that's what you see here.

And then if you click, you can click on these things and you can see that happening here as well.

I hoped to have a bit more time for this one because this one is showing head of line blocking.

No, no, no. It's fine. I'm going to try. So here, here you should see top to bottom is, again, the Y axis is the amount of bytes that you've received for this file.

So the green file here is the one megabyte file, one million bytes, right?

Top to bottom. And the X axis is, again, a bit the timeline.

When is that section of the file being downloaded? So you're at the start.

This is where green starts. So you get like the first few bytes from green here.

And it goes on and it's fine. Here it's interrupted for the pink stuff. This is what you see here.

But what we see here is that apparently a bit of the green stuff was lost as well.

That's what you see here. Green is being retransmitted as part of the gray.

Which parts of the green were retransmitted? Well, apparently these very fine lines.

I'm not even sure if they show up well on the stream. But there are very, very fine tiny lines here, which means just one or two packets were lost for this particular resource.

And that's a problem. Because these are very early packets in the stream.

All the rest beneath that was received, as you can see here. This was all received properly at the correct time, all of this as well.

But because we had this few packets that were lost early on, and it took a long while to retransmit, you can't really do anything with the rest of the data.

So all the bottom data is you could use it if only you had this one little packet that takes a long time to get.

And that's kind of what the black bar indicates. The black bar indicates when is this part of the resource actually usable by, let's say, a web browser.

And because you had a very early packet loss that takes a long time, all the rest of the resources actually blocks.

It's received, it's sitting in a buffer somewhere, but you can't use it because it's just one packet.

In this case, at around 600 milliseconds, you might have had that data.

In total, you received 99% of the thing in one and a half seconds.

But there was an extra few hundred milliseconds just waiting for the last remaining bits.

And that's header line blocking. But in this case, it's just that one green stream, like you could presume you've made progress on the others.

But if they all exhibited this weird thing where maybe they all lost one packet, then I've seen this in some of the traces I've been looking at, that they all kind of get a bit blocked.

And by interpreting both the top and the bottom views, I've been able to figure out problems, not necessarily fixes.

Sometimes things just go away. I'd encourage people to test multiple times. It's always good.

Yeah. Oh, that's great. So again, this is one of the things that they say, QUIC is going to be so much faster.

HP3 is going to be faster than HP2 because it solves that of line blocking problem.

In theory, yes. And in many cases, yes.

But as you can see here, there are still plenty of edge cases where that doesn't hold.

And in this trace, you've got, what, 10 requests for a million bytes each.

You're fitting a thousand bytes per packet, maybe. So you've just got a flood of data, both packets and, sorry, retransmissions and acts coming back in the other direction.

To try and just look at that in logs is really hard. And some of these problems only manifest when you're doing large concurrencies, of my experience.

Yeah. Doing this in a wire shark kind of visualization is hard because these are thousands upon thousands of packets, like you said.

I'm not going to go too deep into this one just to make sure that you can all see this.

This is one that does round robin scheduling.

So if we zoom in, it's going to send. So each of these bars is one package.

And you can see it switches between the different streams.

It's a different color every time. Which is what you want or expect. Except for here, at the start, you suddenly have this big yellow thing.

And then when there are retransmissions, suddenly it says, you know, I'm not going to do the round robin anymore.

I'm going to send them all in full as well. And so, again, that may or may not be what you want.

But the visualization immediately shows you that, you know, these are probably normal areas.

And I can very clearly see strange areas that I might have to look at.

And the cool thing with this visualization is the Q log data that you need to log isn't that invasive.

If you can log stream data frames, which is the streams that carry application data, you can see what's happening without having to assign any, like, semantic meanings to them.

That can be useful. But if you're doing, like, some other development of a protocol on top of QUIC, this tool will work equally well, which is, like, to me, a really cool thing.

But that leads me on to my question from Piers O'Hanlon, who says, I like the look of Q log.

Any plans to expand or provide similar visualizations for other protocols than QUIC?

Oh, you guys are fantastic to keep the thing flowing.

Because the last thing I wanted to show you is the packetization diagram.

So, I'm first going to, the thing we want to look at is here at the bottom, right?

This gray section here.

So, the idea is if you, everything is sent in packets and in frames, right?

And if you have a packet, you also have a packet header or a frame header, which is kind of metadata, like, how large is this packet and what is the packet number of this packet and that kind of stuff.

So, you have some overhead, right?

If you send data from your file, you're going to have a little bit of overhead added at each step of the protocol layers.

So, that's visualized here.

The top one is the payload. That's the good data, the one you want. And then if you have these lower things, that means you will have overhead.

Just to understand that, we can now zoom in.

So, what this does, as you can see here on the left, you have the bottom row is TCP.

These are the TCP packets. I'm just going to zoom in so that it's very, very clear.

So, one of these is a TCP packet. This is one TCP packet.

This is gray stuff. And then the TCP header is, like, the small stuff from before.

That's why I said, that's why it extends here with a white area, because that's a header.

It's not carrying data. On top of that, you have TLS records. So, this is, just to be sure, this is TCP, TLS, and HTTP2.

To answer your question, yes, we can do this for other protocols as well.

Yes, we've done that. And it doesn't work for everything just yet.

But it does work for the packetization diagram. It's a proof of concept.

So, the next layer, the red stuff, is the TLS layer. And then the blue stuff is HTTP2.

And then the thing above there, stream IDs, that's basically the same as the previous visualization.

That's, like, the multiplexing that's going on.

So, you can see that this server is also sending stuff in a sequential order, right?

Every file is downloaded in full. So, here, by doing this, you can zoom in and find problems in how things are packaged.

I'm the trace we made from the Wikipedia website.

Okay. And they did, they had a very weird tenure where they started a new TLS record.

So, a new red layer thing. Every time there was a new HTTP2 level header being done.

So, if you have a frame header or an HTTP2 layer, they suddenly cut off the TLS record and start a new TLS record immediately.

Which is not what you expect. You'd expect the whole frame to be in a single TLS header and then even to pack as much as you can into a single TLS record, which they didn't do.

Which, again, is something that is relatively obvious here.

Because you start seeing these small, very narrow shapes in the packetization diagram.

If everything else is quite big and you see these narrow gaps, that means you can see there's something weird going on there.

And I have a very nice example of that in this one. So, this seems normal.

You're just downloading a very big file. This is a five megabyte file.

When I start zooming in. Sorry, go on. This is the last thing I want to explain.

When you start zooming in, you start seeing these weird, very small packets, right?

I don't know if this is very clear on the stream. You start seeing anomalies in how the thing is rendered.

And that's not just because the browser is doing some weird rendering thing.

That's because there's something weird in the trace.

Because if we zoom in, what we can see here as it starts showing up is that we actually, this implementation is as quick as it sends these tiny, tiny packets containing just a single byte of HTTP3 data.

Right? This obviously also wasn't a bug in this case, but a very weird behavior where you shouldn't pack a single HTTP3 byte into a full quick packet.

That's not very efficient.

And again, by zooming in here, you can figure that out. I guess that's all the time we have.

Yeah. I just got to thank you so much for coming on the call and taking the burden off me for a week.

To me, this kind of thing is really interesting.

I've been watching for a while, some of the background, if I understand correctly, some of the initial motivation was to understand this kind of visualization was to understand effective prioritization on web servers and what happens.

Because in dev tools and stuff like web page test, you can get a feeling for the data that's coming back from the server and whether all the urgent stuff is sent before the less urgent things.

And it gets really tricky. So, I've been following for a while and it was kind of a show on my half to be able to implement Ulog and then have a tool for me to use.

And Robin's been very kind in explaining multiple times what the different visualizations mean.

So, yeah, I'd just like to thank him again for coming on the show and explaining it in person.

There's more work to do, I guess.

But I probably want to clarify, if I'm correct, that a lot of the issues that you find, you've actually written up in papers and you've reported to implementers throughout the whole standardization process, which is all about running code and finding these things.

And I think for myself, but for others, I can say that that's been really valuable because finding the issues is generally the hard part.

Fixing and validating them can be fairly quick, if not a bit painful.

Thanks, everyone, for watching.

Thumbnail image for video "Leveling up Web Performance with HTTP/3"

Leveling up Web Performance with HTTP/3
Join Lucas Pardue and friends for in-depth explorations on using the latest web technologies to enhance performance and security!
Watch more episodes