Cloudflare TV

Leveling up Web Performance with HTTP/3

Presented by: Lucas Pardue
Originally aired on September 11, 2020 @ 3:00 AM - 4:00 AM EDT

This talk is about the new protocols QUIC and HTTP/3. It is aimed at web developers with basic familiarity with HTTP and its role in performance. It steps through HTTP evolution using a computer game theme for novelty and visualizations. Once some fundamentals are established, it looks at some tools/services for trying it out yourself.

Episode 11

English
Protocols
Performance

Transcript (Beta)

The web, a digital frontier. I've had a picture of bundles of HTTP requests as they flow through the Internet.

What do they look like? Pencils? Kettles? Are there things in my shed?

I'm running out of ideas, people. Yeah, welcome to another episode of Leveling up Web Performance with HTTP 3.

As you can tell, it's the summer.

It's still warm. Not too bad. I have a different background, which I'll describe in a second.

But before that, let me introduce myself in case this is the first time you're watching.

And if so, you've missed out on weeks and weeks of wonderful stuff.

I'm Lucas Pardue. I'm an engineer at Cloudflare. I work on protocols stuff like HTTP 2, HTTP 3, DLS, QUIC, and so on.

This show is called Leveling up Web Performance with HTTP 3.

The focus, if you believe the title, is about web performance.

But actually, we spent a lot of the last weeks talking about QUIC and understanding how you can look at the networking angles of how loading web pages, say, by the Cloudflare Edge might work, how you could enable that in experimental or beta browsers like Chrome and Firefox, and how you might be able to debug these things with different tools.

So we've covered a lot of ground over the last few weeks.

Some of the regulars might recognize, sorry, might be missing my animated background.

What I have here is one I changed. So it's the last background I had, which was on Friday, which I'll explain what the event was.

But if I just duck out of the way, what this is is a mocked up T-shirt that makes fun of the fact that once upon a time, people used to refer to QUIC as Benji TCP 2.

And there was a joke going around, which referred to a popular movie that I won't name.

I'll let people guess. And it would say, call it TCP 2 one more time.

But we're kind of beyond that point in the community. And now we're getting to this whole confusion, say, between what was Google's early QUIC work, and what is the ITF QUIC work now.

A lot of times you'll see QUIC introduced with QUIC means QUIC UDP Internet connections.

And we'll all be on that point now. So if you see that, just imagine that it makes us angry.

QUIC is not an acronym. It's just a series of capital letters.

And some people might shout that. Some people might pronounce that Q-U-I-C.

But the rest of us just say QUIC, and we get on with the more fun and interesting business.

So today's show, I'm going to carry on compiling H2 Load with HB3 support.

I'm going to save that for the second part of the show, because that might give an opportunity for people to leave if they find it boring.

But before that, before I bore you, I'm just going to talk about what I did on Friday, which was attend something called the Epic Workshop, which was hosted alongside ACM SIGCOMM 2020.

So let me pull up some slides for you. The wonderfulness of Zoom. Ooh.

Let's hide the side panels from you, so you don't get distracted. So yes, this is the show.

If you have questions, remember, this is a live show, so things can go wrong.

It's just me, so I hope I don't swear. But if I do, I apologize ahead of time.

You can email us at livestudio.tv and ask a question or make a comment.

I will respond if you send something, I promise. And it's safe for television.

Or you can just at me at Twitter, and I can probably not respond right now, because that requires multitasking, and I'm not very good at that.

But I will respond afterwards. So that's the formalities and the administration out the way.

As all good meetings should start. And let's talk about Epic.

So this is the Evolution, Performance, and Interoperability of QUIC, which is a workshop that was held on Friday, August 14th, to kind of run alongside SIGCOMM, which if people don't know, I don't know much about it either.

It's my first SIGCOMM event.

But imagine a special interest group, that's what SIG means, but people focused on com, which that can mean whatever you want.

But really, it's about networking and aspects of Internet work stuff.

So you recover things like transport protocols, just general measurability and observation of networking on the Internet, applications and how they do stuff too.

So this was running all week long.

And there's lots of kind of areas where I didn't know much about.

So it's cool to look into things like programming, network equipment using P4, and then programming effectively your operating system with BPF, running virtualized network functions or software defined networking, lots of different stuff.

And then say up to things we're more familiar with and have touched on this show, congestion control, etc.

But the real purpose for me of attending SIGCOMM was for the EPIC workshop.

So yeah, the intent was to kind of bring people from industry and academia together.

And I attended the last one, which was just shy of two years ago.

And it was very good to see different people, say presenting on their experiences with real deployments, versus say the research community who's looking at things more from a clean room environment, or maybe even long term research goals.

And having that mix of people's really good. We've talked about the IETF group, which is, it's not exclusive, anyone can join and participate, but kinds of people that do tend to be people with an implementation stake.

So the large deployments like Cloudflare, or browsers, or other people who are maybe interested in more of the protocol level, but still have an implementation or actively working on it.

Saying that it's really useful to have other people in the community look at the protocol itself and implementations and do work here, because we don't have all the answers.

We're just trying to develop things and ship what works and make it work well.

So the workshop itself, so kind of a whole day, but a short day, say with a break.

And this was great, because compared to some of the other days, we had some time for like ample time per talk to discuss around the topic itself, ask some questions, get some kind of interactive Q &A, and actually then break and go for lunch, which was good.

Sometimes with these things, by the second session, you are pretty tired, and it's hard to stay focused.

A chance to recharge. Yes.

So I'm not going to go into this whole thing explicitly, but I just want to canter through kind of my personal view of it, what I remember, and give you some links and maybe say to people, if you visit the SITCOM Epic Workshop website here, there's links to the videos, the PDFs of the slides and PDFs of the papers themselves from lots of different people.

And maybe one of those would interest you. Maybe they give you some ideas.

Maybe you're one of the authors on the paper and kind of want to correct me on something I get wrong now.

But anyway, we started off the day with a keynote speech from Ian Sweat at Google, who's talking a bit about the history of QUIC, but through the lens of kind of CPU usage.

So we got to see maybe the evolution of QUIC deployment at Google and understand, say, kind of some of the motivation for them to develop QUIC in the first place, and talk about some of the performance measurements that they did of Google QUIC, but also of IETF QUIC as it's coming in now.

And to say that, like, there's been historic reports about how much CPU usage QUIC has compared to, say, TCP and TLS.

But the things are moving on and that we need to understand not just a headline figure like this, but to dig in and to proportion out CPU usage.

And that's a really important thing here when we're talking, you know, a protocol that can be implemented in user space, whereas TCP is in kernel space.

A lot of people focus on that issue and assume that, oh, it's in the kernel, it must be faster.

But that's not necessarily true. Just any piece of software has different architectures and it might be developed in following one design initially.

And then from there you iterate and you identify opportunities for improvement.

So, yes, we're using UDP syscalls and Linux stack maybe didn't have as much UDP tuning compared to TCP to start with a few years ago, but there's opportunities here to improve on things.

So, yeah, this was just a nice keynote to kind of talk through some of the historic performance bottlenecks that were found and the different steps that have been taken already or that can be taken in the future.

Not just for QUIC, but we're talking things like generic segment offload or known as GSO or on the receive side, GRO.

There's tools and techniques that can potentially help improve the performance of things.

We also mentioned some of the crypto aspects of QUIC packets are encrypted and they require certain construction and then dispatching to an encryption routine.

Are there ways to kind of batch that? So, some of the work that other people in the community have done, not just Google.

So, taking a look at a few different factors and, you know, the takeaway here is that I think there's still ripe room for optimizations and so that's kind of why I like this talk, because it set the scene and didn't just say we're done, the job's done.

Here's some angles that we might want to look at as a community or bring your ideas and we'll see in some of the other papers and maybe touch on that later.

Then we went on to testing actual implementations, not for performance, but for let's say correctness.

So, this is a paper by Vidhi Goel, Rui Paolo and Christophe Pasch who basically took a tool called Packet Drill, which is an existing thing that's used for TCP testing.

You can define test scripts with strict timing requirements and to see if they could effectively adapt that to test quick and the answer was they could.

So, they took a range of different implementations and were basically able to affect timings, say, or insert things.

So, if you imagine a quick packet that has some types of frames in them, you know, so, we set the client and server and they would exchange a handshake and then maybe you'd do a request response exchange and everything normally would work fine.

With a tool like Packet Drill, you can screw that up basically and try to see if the endpoints cope.

So, when you can measure if they're doing what they should do on a golden path, say, but you can also measure if when things go wrong, like they do on the Internet, what happens.

So, you could drop packets, say, and see if recovery works okay or insert, in this case, frames into packets with values that either endpoint wouldn't produce.

So, from that angle of the work, we had a report from FIDI, say, on the Quiche implementation that they'd done this for, I can't remember what kind of frame it was now, but they found a bug in Quiche and, you know, we investigated and could confirm.

We wrote a unit test for that, you know, it wasn't a catastrophic error, but it was better to catch that now and what was great is that it was under controlled conditions and so, you know, having a test script that described this thing allowed us to reproduce in our environment and fix it and prove that it was fixed with a unit test.

It's really interesting work. It's not, at the moment, open source, so reproducing it was slightly harder than just, you know, downloading the modified packet drill and running the test scripts that had been developed as part of this paper, but I understand that, you know, that's something that the team would like, they just need to get the permission.

So, we all understand this, but I think a lot of us in the community would be really interested in taking this and maybe developing some more test scripts or whatever and until it's open sourced and made available to us, we kind of can't do much.

So, say the pressure's on to Biddy to get that approval and she did say during her, well, shortly after the presentation, I think maybe later on in a day that she'd asked the people, kind of giving them another poke.

So, thanks for that, Biddy. We look forward to seeing what you come up with.

What's the next one? Automating quick interoperability testing.

So, this is, I'm going to focus on this some more after we go through some of the other slides and stuff, but this is a paper and presentation by Martin Zeman and John Iyengar about a tool called the Quick Interop Runner, also known as the Quick Network Simulator.

It's kind of two related projects and just talking about one of the major things in the ITF is about getting running code, rough consensus and running code.

So, there's no point just running code on your own. These are networking protocols.

They need to speak to things and so you want to interoperate with other people and the more people, the better.

The more diversity you get in implementation types, the different kinds of analysis, reinterpretation of what the specification says by different people with different mindsets can help reveal that the language was ambiguous or maybe needs correcting or just bouncing ideas off each other.

I think that's the most fun bit. You can implement something in multiple ways and there's trade-offs or whatever.

So, you come up with a whole matrix of implementations and that's kind of where we'll come back to in a short while.

We had this paper called Same Standards, Different Decisions, a Case Study of Quick and HB3 Implementation Diversity.

It kind of riffs on the previous point I was just making in that by the letter of the law, a lot of the implementations do what they have to, the normative requirements.

You must do this in the face of seeing an incorrect value or you should send these things out at some frequency.

But apart from that, there's actually a lot of room for novel behaviours and so there's no hard and fast rules on what makes a good implementation of quick.

What's good in terms of, say, one measurement of how quickly can you download a piece of information, say, or how you can recover from recovery might, on the counterpoint, have increased CPU usage or leave a server more exposed to a DOS vector, for instance.

So, implementers are always making these trade-offs for what's good for the protocol and its behaviour versus what's safe to deploy on the Internet or other things that they might want to do.

What's their operating system, ideal target environment, is this a learning implementation, say, that's just focused on making things accessible and supporting as many features as possible versus something that's maybe a bit smaller, a bit hardened, aiming at an embedded operating system and therefore needs to make a compromise on code size versus speed.

So, this paper took a really good overview and characterisation of many of the public endpoints or open source implementations out there.

Just to draw it up and compare them and to say, you know, if you're expecting, say, if you're doing a study on comparing quick to TCP performance, you can't just look at one implementation.

What you're doing is comparing, say, one TCP implementation to one quick implementation and the takeaways from that are limited and scoped and it's hard for people maybe beyond a small community of quick experts to really appreciate that.

So, I think this paper did a good job of putting that all out on paper.

If you're not into reading papers, the video presentation of this is very good and interesting and makes the whole kind of stuff way more accessible with, you know, fun and good examples of different stuff.

So, yeah, good work, folks.

And what it highlighted is the effort that has to go into this kind of thing.

If you're trying to test different implementations, we focus on web browsers here, right?

So, each of the web browsers have different magic incantations to get it to run with HTTP3 and sometimes it doesn't work due to various issues like vision mismatch or the Internet's broken that day.

So, maybe you want to run stuff locally in a controlled environment.

Well, guess what? You need to then compile things from source code, like I've been trying to do with H2Load for the last few weeks.

That's tricky, especially if your focus is, say, as a researcher just trying to look at congestion control.

You need to become au fait with multiple implementations all implemented in different languages with different build tools and environments.

It's quite tricky. There's been some discussion on the mailing list about APIs, which we talked about one of the previous weeks.

You know, having a common API could help that, but the practicalities of really getting there at this point in time, there's a lot of barriers to doing that.

One, because we don't know what the right answer is yet.

So, the challenge to people doing performance measurements or researching them is to really understand what you're looking at and to log that.

You know, even an implementation will have many different configuration parameters.

If you're going with the defaults, chances are they're not really going to give you what you need for your test and to report that, you know, just think about reproducibility.

Who else could take what you're saying and run it and come up with the same answer or find that maybe if you tweaked a parameter, you'd get something better.

So, important things to consider, in my opinion. Other people might not care, and that's fine.

That's up to them. What do we have? Okay. So, those first three papers, let's go back to the agenda quickly.

We're in the morning session, and then we had a nice lunch.

I had steak and chips, if you care, and then we got into like a second session, which was a bit more mixed, a different variety of papers.

So, we had making quicker quicker with Nick Offload. So, this kind of resonated with Ian's keynote speech about CPU usage.

You know, the longer, the more cycles of CPU things take, the worse the performance in a very broad stroke there.

So, one way of speeding things up is to not consume, you know, general purpose CPU, but to offload them to specialist bits of equipment, hardware, or instructions.

So, we can do crypto offload by things like NES, NI instructions on an Intel CPU.

But, you know, the interesting thing with this paper is, you know, they said clearly in the abstract, this isn't the first time that people have said, well, we can do crypto offload maybe in these encrypted transport protocols like TLS or QUIC.

You know, there's other areas that are consuming time.

So, let's take another look at those and try and quantify them and see what the options might be.

If we're looking at other forms of offload, like packetization, something you figure out of like if you have a stream of data, just being delivered at a very fast rate, say gigabits per second, something somewhere has to take that continuous stream and chop them up into packets that are correctly sized for the path, the mean transition unit or maximum transition unit, but also all the QUIC framing overheads and all of that stuff.

It's not necessarily hard, but it is time consuming.

So, is there a way to say pass that stream through into something that's better optimized?

They're doing that, or it could parallelize the packetization of these different kinds of angles.

So, this was cool work.

You know, it's not very, say, the concept of doing this isn't an impossibility.

We've seen it before with, we're talking about generic segmentation offload earlier, and that's a socket interface that you can use to have a performance improvement.

Maybe you need to measure it. But what that does is add complexity to the code, because that might be platform specific or hardware specific, say, in the case of some of the optimizations here.

The discussion then went on to maybe what do APIs look like?

Are people going to be willing to, say, customize to an API?

Is there enough value in that? What does it look like? I think this paper did a good job of setting out the problem stall and then raising questions about how we might solve this in the future.

I'm probably missing a whole load of stuff.

Good, interesting context with this. I'm not a hardware or kernel API expert.

So, I'm going to be going back to this one and taking another look at it, because I think long term, this is an area that we'll see more work, even if it's not completely standards related.

We had a focus on video. So, if you think about QUIC, talk about web performance here, mainly focused on web browser loading.

If you recall in the previous episode, I would say we've now enabled HB3 for this live stream itself.

So, people could now stream video using a HTTP based adaptive video streaming protocol, like Dash or HLS or something, and run it over QUIC.

So, does a different transport protocol provide advantages over others?

Some of my past work has looked at, say, was there any advantage to using HB2 over HB1.1?

Could multiplexing help?

Could server push help? Could these different features of the HTTP version give you any major advantages?

Can you do something brand new?

Or can it just help in the basic case where you're trying to load video? The typical thing is it's difficult to measure the quality of experience of a user.

With web page loading, the faster the better, generally.

Or you want to minimize content layout shifts and stuff like that.

And they kind of, we talked previously about web vitals or web metrics like this, where you can kind of condense things down quite quickly.

And there's no one metric that's the answer, but over a series of metrics, you could say this is good or bad.

With video, it's different. It's not impossible, but there's different ways of assessing visual quality, maybe focused on encoding a video.

So, a lot of time spent into looking at codecs. That's not this. This is more about the client running various algorithms that can adapt to the local network conditions in order to try and achieve the best video quality, given those conditions.

So, if you, say, automatically go from a low quality and you're able to step up to higher ones, but you automatically, well, it's all automatic, but if you step immediately up to the best quality and you're on a network that, say, it's 6 p.m., everyone's watching this segment, this TV segment, why wouldn't they?

There's going to be congestion, maybe, in your local access network, affecting your download speeds.

So, if you try to download at a rate higher than you can achieve, then you're going to start buffering and interrupting you watching any video.

The whole purpose of adaptive video streaming is to allow the client to dial down the quality so that the bandwidth usage falls within their upper limit.

It's way more complicated than that, but in a nutshell, those kinds of things are true.

So, stepping up and down in qualities to adjust, well, if you do that too much, it's kind of annoying.

You get this flickering kind of effect every few seconds.

It looks good, it looks bad. It can be distracting, both the content itself.

So, given all of that, this paper was focused on could you utilize the HB3's mitigation of head-of-line blocking to use a different kind of algorithm and use head -of-line blocking avoidance to achieve better quality and effect and do some measurements on that.

So, to me, this was super interesting.

I think we'll see some more work in this space. I don't know if we have the answers yet.

There's a lot of agility in adaptive video streaming algorithms and the test conditions can be quite hard, especially reproducible testing in my experience in the past has been tricky.

The state of the art has probably progressed a lot in the four years since I was last looking at this in depth, but yeah, I look forward to seeing some more.

Then the final paper was analyzing the adoption of Qwik from a mobile development perspective.

This is a cool kind of work by some people out in Chile, I believe.

They were able to basically install some VPN software on some volunteers' handsets and inspect the traffic to distinguish if it was a TCP, TCP and TLS or Qwik, what kind of Qwik versions they were, and then associate that with the different kind of applications that were running on the browser.

So, you know, were they apps? Were they browsers?

Where were they going? And to look at the different kind of patterns.

And so, we had work from Facebook, which just reminded me, I forgot about Facebook, who I think presented on the Thursday.

Yes. Sorry, Facebook pals, who are talking about running, they're talking about something different, but they've been using this move fast implementation and running a large scale deployment of Qwik for a while, and they've got a lot of good, interesting data there.

And so, this was kind of a different perspective on, say, beyond Google and Facebook could do their own measurements, is it possible to measure this stuff?

I think in the future, as this library stabilizes, we might see more apps kind of bundling in their own Qwik libraries or using platform features that are available to talk Qwik, to talk to their own servers and hosts.

So, there's a cool chart in the slides. I didn't manage to see if it was in the paper or not, but you can attribute all of the connections by protocol type and the client and kind of associate them to a server on the right-hand side and see where they will overlap and cross and see the breakdown of this stuff.

And that was very interesting. I'd be quite keen to see what that looks like in maybe a year or two's time.

But anyway, back to Martin Zeman and Jana Iyengar's interoperability testing.

We're going to have bang on 30 minutes.

Let's take a few to talk about this. So, there's a website. It's this. It's the Qwik interop runner, which is at interop.zeman.io.

And what this does is just show some green and black and red stuff, which looks great, right?

There's a lot of information here.

If you've seen previous presentations by me or others, we've talked about Qwik interoperability matrices.

The one that you might be familiar with is this Google Sheet that we use a lot of the time during implementation, say, hackathons or whatever at the IETF.

What we have in rows are clients and columns are servers.

And so, you're testing a client talking to a specific server.

In this case, the first example, EcoQwik by Christian talking to Qwant, which is by Lars.

Yes, I'm on first name terms with all these people. Not really.

And there's a bunch of letters in which relate to different kinds of tests. And so, maybe, you know, you're going to sit down virtually with somebody or just point at a web server that's running on the Internet when you've just woken up and they're asleep.

And then try and do something. Talk to that client. Do a handshake.

Can you send some data? Can you do this thing? Yes. Okay. Check box. I'll write down that letter in this box now.

It's a very manual process. Some people have scripted this up.

So, they're able to kind of just run the test and then grab the logs that come out at the end and say, yes, I succeeded.

I'll generate all of these letters and then I can manually paste them into the spreadsheet once I'm done.

But it still requires some level of human involvement. Maybe they've got jobs. I don't know.

In our case, we kind of just did these manually when we could or we focused more on making sure that the server's running and if people are finding issues, responding to those and getting those fixed.

A cache can run as a client. But our main use case is the web server things.

So, yeah. It's people that may be busy or they feel that they don't have the time to coordinate because of various reasons, like they're on holiday or whatever.

This grid of interrupt matrix can suffer from humans not filling it in.

So, we see different colors of shades of green and things like that.

But it does depend on people. So, the interrupt runner takes a slightly different view on this of basically creating a controlled environment.

So, using Docker to basically have a network where we have a Docker image for a client, a Docker image for a server, and then a network in the middle that can run a Docker image with what's called NS3 network simulator that's able to provide traffic shaping so you can test unreliable networks as well as reliable networks.

When we're doing things on the Internet, you know, you'll get issues like somebody's brought the web server down for maintenance or, I don't know, there's been a BGP leak on the Internet and you can't talk to that person for some period of time, which is another factor of why that matrix sometimes doesn't get as populated as much.

So, the interrupt runner is basically infrastructure as code and there's all this stuff I'll show you briefly, but it runs each night and then the results get logged here as letters.

So, green means the test succeeded, red means it failed, and gray, which I don't know if you can see there, shows that maybe this is a feature not implemented by the combination of client and service.

So, one of the other doesn't implement this thing, so it wasn't tested at all.

This shows talking about HB3.

So, if you look at an example here where the quick go implementation, so an implementation go in the quick language, in the go language, talking to NGTCP2, our favorite project, it succeeded.

And so, what's cool is you don't just see the results, you know, the interrupt matrix just has a result.

Even sometimes via, you know, a slip of a finger, a copy paste can say a claim that a test succeeded when it didn't and that can be confusing for people too.

So, for any of these tests, they have links to the test run that contains all of the server logs and sim.

Ah, yes, and PCAPs are the results of the tests. So, you can go in, say, and look at this quick go client and log what it did as Q log.

If you remember, Q log is a format by Robin Marks and Co that a lot of us have implemented, have picked up as a format to log to so that we can leverage his tools to analyze, say, the functionality or performance of the protocol.

And this is where it's really cool because you can actually hyperlink directly to these logs and basically you don't have to upload stuff, download and upload.

You can just, kind of, I don't know what the term is, hot link across the two.

So, what happens is somebody would maybe say in this case, like, I got red.

I don't know what's happening here. Like, this test is failing.

This is what my client said. Here's the Q log. Blah, blah, blah.

Oh, what happened at the end? There's a transport error with an error code.

376. I don't know what that means off the top of my head. And so, you know, from my perspective, what this allows is somebody else to do all the hard work or the mundane work of running the tests.

Gives me some confidence that they've done in a controlled environment so that if somebody using a client that I don't control says something went wrong, chances are it's likely to be a real thing, not a transient issue or some of their weird annoying thing.

So, I can go in and look at the logs and maybe while I'm on a train somewhere, I could diagnose that issue without having to rerun it myself or try and coordinate on rerunning and asking what version of the code.

Now, what you'll see is we have this 611 implementations. Maybe there's an off by one error in my counting.

At least for the server and for the client, it's probably a similar amount.

And a number of tests for each of these.

So, you know, this is quickly going to be a scaling problem. And so, the runs take 20 hours or so, maybe more, to get all of the results here.

So, this is why they were kind of running as a nightly job.

And sometimes things would break or people would say, I've just updated my code to add a feature or fix what failed yesterday.

It's in a new Docker image. And then they'll say, well, we need to wait overnight for us to get the test results.

It's not ideal. And so, what was really cool during the workshop is kind of talking about this.

Nick Banks at Microsoft said, you know, I've played with parallelizing this in the past.

Let me have another go, see if I can work this into Azure pipelines.

That's not something many of us in the community are that familiar with.

But Nick pushed on with getting a proof of concept there and seeing if you could get things to work.

And then as a collaboration with trying to understand some of the quirks of Docker and stuff, we're able to get, you know, an environment that runs these tests.

Probably need some more polishing up.

But we can get everything done in about 20 minutes or maybe half an hour, something like that time range, which compared to 20 hours is a lot quicker.

It's going to enable a lot more faster turnarounds, which I think is going to be really exciting in the future.

Not all of these things sorry, not all of the implementations are on there yet.

You do need a bit of understanding of Docker to kind of get started.

But once you pass that barrier, then it becomes incremental.

And what you can do is, you know, this is Martin's Docker GitHub repo, which is kind of the basis of everything.

And you can add an implementation in here.

Just say, look, where is it? Quiche. Yes, here's our link to our Docker image.

It provides the client and server roles. That's just an enumeration.

And then wherever we host that image, I think it's in Quiche itself, actually.

We have a script that integrates with kind of the bootstrapping of each test.

Yeah.

Find that later. And yeah, it's basically you can opt into each of those different tests in the interop runner, which is why we get green, red or gray.

You can start small and work your way up, basically.

That's the point I was trying to make.

So, yeah, this is a different angle to some of the stuff we've talked about in the past.

And like I said, Qlog and PCAT means that you can use those tools to analyze stuff.

And these are public. So, if people are interested in researching, you could just, you know, grab all of these and, say, maybe do another characterization analysis of, you know, how does QuickWik or come up with some more simulator kind of environments.

We've got a whole different kind of set of tests of measurements, more than functionality testing.

So, looking at things like throughput in or good put and cross talk.

So, comparing how does Quick behave when there's other traffic on the network that's not quick.

Stuff like that is pretty interesting.

So, thanks for all the work of all the different people getting involved in that.

Right. That was the end of my, what I did at Epic last week, which was just sit and listen to other smart people talk.

Now I've got to do something.

So, we're going to go back to H2 Load, our favorite time. If you want to tune out now, do go.

This is a bit like the football results in the UK. If anyone is from the UK and watches the news, they'll tell you to please turn away now or turn over.

And then within 10 seconds, they tell you the results. So, unless you're quick and you find it out anyway, it's kind of annoying.

But what we wanted to do is run a tool called H2 Load against a web server.

My web server and load test it.

So, this website is just a really naughty application that basically waits for some period of time, 10 milliseconds in this case, until it responds with an answer.

And so, we can use H2 Load to try that out and get some results. You can see here that it used HB2 and the goal I set out a few weeks ago was to do HB3.

I wanted to test like draft 27 or 29, say these days.

And the problem here is that it tries to create a TLS connection, like an actual TLS over TCP connection with an ALPN ID of H329, which is completely invalid.

The server doesn't recognize and so the handshake fails immediately.

And that's not great. People might want to load test things and get some basic performance.

So, how do we address this issue?

We want to build a version of NGHB2's H2 Load tool. And to do that, we need to do it from source and we have to do all of these things.

Maybe somebody's got a Docker image or some cleverness.

Here's what I built earlier. Just reuse my binaries.

But I thought it would be fun to try and compile this from source on a cleanish image and see where I go wrong.

So, there's a few effectively prerequisites to get into building NGHB2.

And we've been working through them over the weeks, slowly but surely.

So, we started off with OpenSSL. Then we got NGHB3.

And then NGTSB2. So, where were we at? I got as far as NGTSB2 and for the life of me, I can't remember how far we got into that process.

So, if we try to rerun the commands here, let's see what happens.

Let's just try it out.

I've got a feeling we did do this last week.

But you know, what's the harm in running things twice?

The harm could be that nothing happens and just sit here for 15 minutes.

But that doesn't seem right.

Maybe.

Yeah, we can say we're not Mac users. We can ignore all of this stuff. Big, long configure, which sets up all the package config.

Annoying stuff. And paste is continuing to fool me on this device.

So, it's very tedious. I do apologize. I think we ran this one last week.

Let's see what it says. At least we get some semblance of progress.

I've got to worry that the text is always too small and fuzzy for anyone to see.

So, let's just bump that up.

I don't know if it will help or not. It makes it basically useless for me to do anything here.

Of course, the danger here is if running configure invalidates the existing make executables or the outputs of make that we had, we'll end up compiling it again anyway.

I've got more time than I did last week. So, oh, no.

It did create a new make file. Make check.

Ah, yes.

I think the checking consumed some time. So, I'm going to avoid that. I'm going to go multiprocess this week just to see if this speeds up the build.

I wonder how my CPU is doing.

So, that's a concern here.

If I dedicate two minutes to CPU, then the stream fails in some way.

I'm both streaming and recording it locally or it's being recorded somewhere and it's also being upstreamed in order for you to see this wonderful presentation right now.

You can read a blog post about how we did our stream in terms of how we take a contributed RTMP feed and packetize that up and use segmented adaptive streaming malarkey I was talking about earlier.

This is all good.

We're building a library which is written in C. And then we have some examples written in C++ that integrate that C library.

Is it being built out?

This isn't what we actually want. So, let's remember the instructions.

You know, if you believe these instructions, the job should be done once we're complete here.

But that is not the full story. What we need to do after this is to build ng-hb2.

So, once this completes, yes, we can get on with that job.

So, what do we need to do? We need to check out a copy of ng-hb2. We're going to cheat for this slightly.

I have some instructions elsewhere prepared which I'm typing out now.

But we don't want the full Git history. So, this should speed up the download.

And we also, last time I checked, need to check out a quick branch is what quick means here.

So, as far as I recall, last time I checked, is that the support for ng, for hb3, which needs quick, is still on an experimental branch in ng-hb2.

It hasn't been mainlined yet. And so, through the course of these other, oops, don't want to do that, other steps you can see with OpenSSL, it's Tatsuhiro's special build of OpenSSL, especially built for draft 29 of quick.

The other repos like ng-hb3 and ng-tcp2 are on main, but ng-hb2 is on this quick branch.

So, it's confusing.

I think that's the point I want to make. So, it's easy to mess up.

And by me doing this, somebody can talk you through, but we're in here. Lo and behold, we want to run, oops, wonderful scrolling.

That's what you get when you make your screen, or the text on the screen bigger.

So, I'm going to run this and watch paint dry.

Checking my Twitters, if anyone's reached out.

Do remember I'm lonely here. If you just want to say hello, it would be appreciated.

We're done with auto-reconf. And now we have another configure command, which I've got to type out by hand.

And it's pretty awful.

It's very similar to what we had before. So, let's try and cheat this.

And what could possibly go wrong? Everything. So, it's going to work. No, I'm not allowed to type and scroll this down at the same time.

So, I'm just going to make this slightly smaller, so I can see what the heck I'm doing.

Otherwise, it will fail.

That will do.

So, we want to enable applications here. I think, basically, what this means is, is H2Load is an application of the ng-hb2 project.

And so, sometimes, I think by default, it should have built them, but some auto-detect things can fail.

It won't build the apps if you don't have a dependency on the system, and it can silently fail that and not tell you.

Whereas, if you pass this flag explicitly, it will fail explicitly, and you'll know, and you'll be able to fix it, other than get to the end and search for your binary and get confused about what's happening.

So, we didn't prefix the previous builds.

So, that's going to screw everything up. So, let's go back in time.

What could possibly go wrong is we got the command wrong.

So, all of this package config malarkey was absolutely fine, but we wanted to prefix it.

So, we have somewhere that we can install the outputs to that ng -hb2 will care about.

Without that, I think it's package config stuff will fail, which sucks, basically.

So, what we're going to do is run configure again, which will, oops, I need to pass the right flags, of course.

We'll set everything up to do this properly.

Without doing this, it might not work. If I try and do it manually, I will screw it up.

So, apologies for that. I'm hoping, again, we can benefit from multi -threaded building and make rapid progress towards the final step.

Yeah, this is, you know, that additional prefix wasn't provided in these instructions, because there is no make install step.

That's a typical gotcha for me, which is why I've written up some others elsewhere that I should probably share at some stage.

Oh, I've got a call from Glasgow, which I think is probably a telemarketer.

Let's just hang up on them. And somebody did at me on Twitter. Who was it?

D-PAC. Hello, D-PAC. Thank you for saying hello. Or it's a capital E as well. So, maybe that means something else.

But I'm going to presume it's hello, and I'll say hello back.

It is live TV. If that was a test, I hope to have passed your test. So, yes.

Let's just make without testing. Here we go.

D-PAC, I hope you're finding this really interesting watching, because you stayed till the end.

That's made me very happy. C++, bits and bobs.

After this, like I said, we'll be running make install. So, that's going to be too hard.

Of course, it's not too hard. It's just typing and hitting enter.

It won't be too time consuming, I hope. You can see that's going to our local prefix.

Warning. It's gone. Don't worry about it.

Wink, wink, nudge, nudge. Hopefully, it won't mean anything too awful.

HP2, here we go.

Now, we can do our special configure. So, in this case, we want to have enable app.

Again, the prefix is going to be the same here. We need to modify the command from just what was passed to build for ng-tsp2 and make it right for ng-hp3.

So, we need to pass in the path to ng-tsp2. Why are all the ngs so hard?

So, ng-tsp2 slash build slash blib slash edge config. And then we've got these stupid linker flags at the end.

So, let's see if this succeeds.

Again, I can't type. This is enable hyphen app. There we go.

So, while that's going, we're near the top of the hour. I think, as tradition has established over the last three episodes now, we won't complete compilation of this thing by the end of the show.

So, I like to leave it on a cliffhanger.

Next week is my colleague Juno is going to come on the show. I mentioned Juno a couple of times.

He's written some blog posts about, say, congestion control support.

So, Juno is going to come on and talk about that. It's one thing reading.

We're all different kind of learners here. Some people might appreciate a visual presentation and somebody actually talking to you rather than having to read it yourself.

I'm not a congestion control expert myself, but it can help.

So, if you've got questions in that area, send them in ahead of time or tune in live and we'll be just having some fun.

We had a team call last week to keep the morale up where we ordered in takeaway food and then at a video conference together.

Not just me and Juno, but the rest of the team. That was fun. Fortunately, pizza didn't arrive until like 50 minutes in.

So, I feel bad for him, but my pizza arrived on time.

So, I'm thinking maybe next week, if Juno is going to be taking the hot seat, I could do that again.

Whereas for him, he probably shouldn't eat and present.

So, yeah, let me run that past you, Juno, and see what you feel.

It should be a good one. I doubt I'll have time to spend any time on building H2 load in case you find this utterly boring.

The good news is the configure seems to have passed.

So, I'm going to kick off the compilation now, and if I get tuned off and replaced by Cassian maybe on the schedule next, well, I hope you have a good week.

Thumbnail image for video "Leveling up Web Performance with HTTP/3"

Leveling up Web Performance with HTTP/3
Join Lucas Pardue and friends for in-depth explorations on using the latest web technologies to enhance performance and security!
Watch more episodes