Cloudflare TV

⚡️ What Launched Today - Tuesday, June 20

Presented by Sam Marsh, William Woodhead, Jon Levine, Lucas Pardue
Originally aired on 

Welcome to Cloudflare Speed Week 2023!

Speed Week 2023 is a week-long series of new product announcements and events, from June 19 to 23, that are dedicated to demonstrating the performance and speed related impact of our products and how they enhance customer experience.

Tune in all week for more news, announcements, and thought-provoking discussions!

Read the blog post:

Visit the Speed Week Hub for every announcement and CFTV episode — check back all week for more!

Speed Week

Transcript (Beta)

Hello, everybody. Welcome to Cloudflare TV. Today is day two of Speed Week and I'm very happy to be joined by three of my colleagues today to discuss some of the announcements we've made.

In total today, we've launched seven blogs talking about our new home of application performance at Cloudflare called Observatory.

We're talking about a fast follow called Experiments and how you can experiment with Observatory and also our goal for UDP and also a deep dive on TTFB and why it may not be the metric to use for web performance measuring.

There are also posts on Timing Insights, HTTP3 prioritization and interaction to NextPaint.

And I'm pleased to say I'm joined today by the authors of these posts to discuss their blog posts and announcements.

So without further ado, let's introduce Jon.

He's going to talk to us about Timing Insights. Before we kind of get into that, one of the posts I mentioned earlier is closely related to what Jon is going to talk about.

So I thought I would do a little preamble and set the stage, which is TTFB and the blog post around measuring what matters.

So the post essentially discusses in detail why TTFB, time to first byte, makes sense to use in certain situations, but in general, why it's not a great metric to focus on when looking at measuring web performance, particularly using it as a metric to gauge good user experience on a web page or a website.

I won't spoil the blog for you, but essentially the takeaway is it doesn't accurately measure the user's experience, and a good TTFB score does not mean a good user experience.

We've got data to show that.

Conversely, though, a poor TTFB score will almost certainly mean a poor user experience.

So it does still come into play. So to talk about that, Timing Insights and a little bit more, I'd like to welcome Jon.

Jon, welcome. Can you start by introducing yourself to the viewers and what your role is at Cloudflare and what it is you're going to be talking about today?

Yes. Thank you, Sam.

My name is Jon Levine. I go by JPL here at Cloudflare, and I'm a product manager here, and I work on our data products that includes all the logs you get and all the analytics you see in the dashboard.

And yeah, today I'm really excited about some new analytics, or I should say an improvement we're making to our HTTP analytics that's going to tell you a little bit more about time to first byte.

Perfect. So looking at that, why would someone A, want to understand the time to first byte, and B, break that down?

Yeah. So like Sam mentioned, well, it's funny.

So Akil is one of the co -authors of the post that Sam mentioned about measuring what matters.

And we spent a lot of time talking about this topic, and we were having a friendly rivalry about it, like TTFB doesn't matter.

Like, no, you have to measure TTFB.

And of course, they're kind of both right for folks who have.

What we showed in the post that Sam mentioned is if you have a website, if you have a web application, what you really care about is the end experience of your user.

So that's all those amazing web vitals metrics. We're going to talk more about those.

There's a new web vitals metric. It's really exciting.

That's how you know are people having a good experience in your website. That's the stuff that correlates.

All those numbers you hear about, like, oh, you know, we made the site faster, and revenue improved, right?

TTFB is only a small piece of the puzzle.

But Sam made this great point. You said, you know, if TTFB is bad, it's hard for those other things to be good.

So when TTFB is slow, you have to understand why it's slow and what you can do about it and how you debug that.

So that's the first thing. The second thing is, you know, it's really awesome when we're in the world of, like, applications and end users.

You know, we love that stuff, right?

And optimizing the whole experience and, like, gives you so many, you know, chances to improve things.

But I talk to a lot of customers who, you know, they don't have that luxury, right?

They're often responsible for an API. Might even just be other people inside their own company using it.

Or maybe they offer an API.

Think of, like, a lot of SaaS companies. They offer an API to their customers.

They have no idea what the end user experience is going to be. All they see is, like, another, you know, developer on the other end of the line.

And they have to make that API run as fast as possible.

And so in that case, the only thing they actually have to optimize is TTFB.

And so, yeah, TTFB really does matter in those cases.

And so, like Sam said, you know, it's not the only thing far from it.

But you do need to know if it's slow, why it's slow. And so, that's what we're here to help you do.

And so, I knew that we're calling Timing Insights. So...

So, how does Timing Insights help you understand why it's slow? It's really interesting.

So, TTFB, as folks have mentioned, typically you think about it as measured from the client.

So, okay, I sent off this request. It's in the context of an HTTP request.

I sent this request off. How long did it take from the moment I sent that off to the moment I got the first bite of the response back?

What we're measuring, we're calling it TTFB or we're calling it Edge TTFB.

It's really TTFB is measured from our own servers.

So, it's really when did we receive that first bite from the client to when did we send that first bite of response back?

Now, Lucas and I have had some really interesting conversations about well, actually a little more complicated than that.

We won't get into all the weeds of, can you actually measure it that way with things like HTTP 2 and 3 and lots of other things.

But conceptually, that's the goal of what we're trying to do with TTFB, what we're trying to measure with Edge TTFB.

And the reason that's important is, again, a lot of our customers, maybe they don't control their client experience.

They don't have a way to measure it from their clients easily.

So, they just need to know from Cloudflare's perspective, how long did it take?

So, that's the first part of it.

Just tell me how long it was. And of course, you want to aggregate that. So, you want to know what was the average, the mean TTFB?

What was the medium? What was the P95 or all these quantiles which we love?

So, that's one piece of it. But just knowing the P95 TTFB is one thing, great.

P95 TTFB was one and a half seconds. Well, that's really slow.

Why was that? And so, what we want to do is actually break that down for you further.

And so, what we've offered in this first iteration is we have two other metrics which contribute to TTFB, which are worth explaining a little bit.

So, one of them is what we call the DNS lookup time. So, if you think about how our CDN works, right?

CDN is one of the things we offer. Request comes into our edge.

There are a lot of customers where we actually aren't responsible for knowing.

We don't know. We're not authoritative for the IP address of their origin server.

So, we actually have to do a DNS lookup to some third party, some third party DNS host to say, where actually do we go to reach the origin server?

And surprisingly, that can take a really long time. And that's one of those like hidden costs or hidden contributors to TTFB that unless you have this metric, there's really no way to know that.

So, I wanted to make sure we put that in there. I've seen a lot of performance investigations where you feel like you've tried everything and you're like, DNS, who was thinking about origin DNS lookup?

So, that's one thing.

And the second one is just simply how long did it take to fetch the data from your origin?

So, one of the biggest things we have at Cloudflare to improve TTFB is caching, right?

But of course, not everything can be cached. Even if things can be cached, it could be not in cache, could be expired.

There could be lots of reasons why we have to go to your origin server.

And so, for now, we have this sort of total time taken from that first server, the first data center, colo, that the eyeball, the end user hits until we get response from your origin and we come back.

What we want to do, a little preview going forward, is we actually collect a lot more timing data.

We're going to break down about how long the origin request took.

So, actually, sometimes the request is taking time moving on our network.

Once it gets to the end of our network, you can think of it that way.

How long did it take to establish a connection? How long did the TCP connection take?

The TLS connection take? What was the RTT? How long did we wait for your application to respond?

And each of those metrics can tell you something a little bit different that you may have to tune on your origin.

And you don't want to point fingers here, but typically, when people have room to improve TTFP, the most common thing is there's something on the origin side, but knowing is it RTT?

Is it your application server? Those are very different things. There's different actions you might take in response to that.

And so, we want to give you all the information you have to make those improvements.

Yeah. So, it sounds like the kind of type of traffic that will benefit motion is, like you say, potentially non-cachable dynamic traffic that's not coming from cache.

So, APIs, is that fair?

That's right. And I think one thing that's interesting about this metric too is...

So, there's another interesting... I'll share a conversation I had when we were building this feature out over the last few weeks is, it sounds simple like, oh, TTFP.

Yeah, just put that in our analytics API. We can talk more about how that works too, which is really cool.

It's using our GraphQL analytics API.

So, it's a very flexible way to query the data. The reason I'm mentioning this is because we had some debate about what is the definition of TTFP we should use.

And actually, one thought, Sam, is we should actually limit it to just this stuff that's not cached.

But as we were talking about it, we realized one of the great things about TTFP is if you're trying to quantify the improvement of something like caching, ideally, you would look at the of all your traffic and you would see that improve because you added caching.

You'd see this thing that previously took time go to zero.

And so, it's helpful to sort of, I think it was like averaging all those zeros to really see the full benefit of what you're doing.

And I think we haven't had that ability through Cloudflare's own data that we exposed to customers before.

And so, it's cool that we now have that and customers can see the benefits of things like adding caching and seeing that improve.

But yes, I would say for the most part, when you're seeing long TTFP, the most common thing, I hope, I hope our cache and everything is already really fast.

And so, we're often seeing problems connecting to the origin, things like that.

Yeah. Yeah, definitely. And that should help a huge amount, I'd say, in debugging rather than having this black box metric, right?

That just says it's slow. It's just now it should give you that nuance underneath of here's where to go look.

Exactly. Exactly. Yeah. And so, right now, we've shipped these metrics through, I mentioned our GraphQL Analytics API.

We have a dataset called HTTP Requests Adaptive Groups. And it's all in the blog posts.

There's example queries you can use. It's super cool. But we, of course, are also planning and working on building out a dashboard, right?

Like a visual UI, because as great as GraphQL is, it can be tricky to type in all those GraphQL queries.

We want to just make it really easy to go to our portal and just see it all there visually, see all the quantiles, see the distribution, see what things are contributing more to that.

And also drill in at the level of like, hey, what are my slowest URLs or host names or origin IP addresses, right?

We have all that dimensionality, right?

Or you can even just see it like, hey, cache versus un-cache, just compare the two things, right?

And really see those comparisons easily.

Nice. And so, you mentioned the GraphQL endpoints. How do customers get their hands on it?

Is it just through the GraphQL today? And what are the kind of future plans?

And you mentioned a dashboard. What are the plans there? Yeah. So, currently, this data set I mentioned, the HTTP requests adaptive groups, bit of a mouthful.

That data set is available on our pro business and enterprise plans. So, anyone, it's ready now.

Now, I'll say that we have, actually, we exposed like three months of data, but we've only been populating this for a few weeks.

So, if you query back, you'll notice at some point, the numbers go zero further back in time.

But yeah, you should start querying. Check it out right now. Like I said, example queries are in the blog post.

Our documentation has helpful guides on how to get started.

There's GraphQL clients you can use. It's probably the easiest way to use your GraphQL API, if you're not familiar with it.

I mentioned the dashboard.

And then one other thing, I'll just give a little sneak preview of it because I'm excited.

We are hoping to expose a SQL API. The cool thing about that is if the same data is in a SQL API, you can easily hook it up to a data source like Grafana.

So, that's something that we do quite a lot internally. We actually have SQL interface and Grafana.

It's super easy to write queries and iterate on things.

And hope we can share more about that in the future innovation week. Nice. Perfect.

Thank you very much. Keeping an eye on our other guests and want to give them a chance to speak.

So, thanks very much, John, for sharing that. Lucas, your turn.

Welcome. Can you introduce yourself, your role at Cloudflare and what it is you are introducing today, announcing today?

Yeah, certainly. Thanks, Sam. I'm Lucas Pardew.

I'm an engineer on the protocols team based out of London. In addition to my work at Cloudflare, which I'll explain a little bit more next, I'm also the co -chair of the QUIC working group at the ITF, hence the funny hat.

So, as part of that work, we've been developing both the QUIC transport protocol and the HTTP mapping on the top, which is HTTP3.

And we've been involved in that for a pretty long time.

So, the protocols team is responsible for terminating secure connections at Cloudflare's edge before we pass them on to the brain and the other parts of the Cloudflare pipeline of smart things that are happening.

And so, we're responsible for HB2 and HB3 and making sure that's super efficient, super performant.

In some ways, it kind of falls just below the OSI layering of layer 7. HB works the same as it always does, request and response, but the way that it gets mapped into going over TCP or these QUIC protocols varies a little bit.

And in those fine details or fine margins to do really well or to do really badly.

And so, that's kind of where the prioritization aspect comes in that the blog post touches on today.

In a nutshell, prioritization isn't that new. It's kind of, as long as there's been a web browser, it's had to load in multiple things for a web page and it's needed to decide which ones are most important.

When you load up a website, you're going to first request a document, some HTML, start parsing it from the top through to the bottom, from the head to the body.

You can start learning all about images and scripts and styles, all of these various things that kind of interrelate to how the browser is going to process them in order to paint something or put something up on the screen or do some other actions that affect the end user experience.

And so, pre-HB2, HB3, even if you're using Keep Alive Connections, you're requesting multiple things, you might want to request some things in parallel and you might want to spread them out or paste them so you're not overwhelming the pipe, your network connection to the different servers, but you're requesting the most important things first in order for them to probably come back to you first so that you do things in a natural order that the web browser would like.

Different browsers have different requirements, different user agents, things like curl, et cetera, might not care so much for these things, but actually, even the basics of having a big download competing with some other very small API request, it's a position where prioritization can matter, but prioritization only matters where you have a bottleneck, really, where there's some constraints or competition for resource.

So, typically, that's throughput or bandwidth where you're trying to deliver everything as fast as possible all at once, but physically, you can't.

So, you need to be able to pick what's the best thing I could send at this moment in time that's going to improve some objective metric.

And one of the important ones that we use in the web world is something like the Largest Contentful Paint, the LCP.

So, this is looking at the biggest element on the page as you load it and kind of the thing that might have the most visual impact and make a happy user at the end.

And if that thing gets delayed behind, I don't know, something lower down on the page or something kind of invisible to the user, it's kind of just a bit annoying and frustrating.

So, you take that subjective opinion and try and turn it into a more of a quantitative metric, like if it takes this much time to deliver it, the object, and you can make that better and users might be happier.

And that might convert into other business level objectives or websites and so on.

So, that's a kind of broad thing, not looking at HB3 in particular, but as we moved from traditional HB1.1 to 2, we were able to do something called request and response multiplexing.

So, that was the ability to have on a single transport connection, so a single TCP connection with TLS on the top, effectively logical streams that would allow you to multiplex and interleave all of these requests and responses all at once.

And that's great. That's really efficient from the perspective of servers.

Web browsers don't need to open up multiple TCP connections.

It cuts down on some overheads. You get benefits of things like header compression because there's a lot of duplication that happens across requests, like cookies and stuff, and you can avoid sending them verbatim.

So, lots of benefits there. But what you're doing is effectively funneling everything down into a single pipe.

And therefore, there's going to be more explicit contention for that finite resource.

And so, during the HB2 standardization process, this was realized and they came up with a super clever, ultimately like super programmable methodology for defining effectively a mirror image of what a web browser's view of the document object model would be.

So, you could build up this kind of tree from a logical root node in the connection and add like the HTML to it and from that link different things and images and branch off and assign weights.

And you can have this super customizable programmable tree that the client builds up and it sends some signals to a server and the server can look at this thing and then basically use that in its decision making process, assuming it's got access to all of those resources that it could send you, to pick the most important thing to send at that moment in time.

And in theory, this is brilliant and it works great.

In the lab, it works pretty well. But in reality, it's complicated.

It's hard to implement. And so, it was a bit of a mixed bag for HB2. For people like Cloudflare or other CDNs, it was kind of okay.

We decided actually it's better to do something simpler.

What was missing was an ability for the server side to really influence HB2's tree-based priorities.

Because of these dependencies and these logical stream IDs and all stuff like that, that doesn't really make sense for web developers.

You just want to have a basic fetch-based API, like in workers, and say, okay, this is a request for a super important image or script for the way that I just built my website and I want that to be the most important thing.

I'm just going to add a response header.

And so, we built something, that kind of design, a few years ago with enhanced priorities for HB2.

And around that same kind of time, the HB folks working in the QUIC working group were trying to port HB2's prioritization tree into HB3.

And for various reasons, I don't have the time to go into, it just couldn't work.

There's all these edge cases due to the head of line blocking avoidance features of QUIC, where things are not strictly ordered and guaranteed.

That's a benefit on the one hand, but while you're building up this directed graph of things, if you don't know the order that you receive stuff in, you just don't know.

This blog post, today's blog post links to another blog post where I explore some of the problems and the motivations to find a new solution.

And that was a few years ago. And in the meantime, in the IETF, we've been standardizing something called extensible priorities, which is a new solution that's simpler.

It's less kind of direct relationships, but more absolute weights.

So it empowers a client to do things more simply.

The server can still look at these signals and weigh things and make scheduling decisions.

And people can override them with their origin logic or the kind of service side overrides.

And it works really well with other factors, like what other techniques, like fetch priority or early hints, because you still want to be able to influence the web browser to request things earlier or later, if you can.

And there's other segments that are going to be talking about early hints and things like that now.

But still, once the server receives requests and there's multiple things to save on a constrained resource, it's still great to be able to tweak things and control stuff.

So it's very much hand in glove, a lot of these technologies layering on each other.

So just as the example on the webpage, so on the blog post today shows, just by tweaking the priority of one resource, we're able to drastically increase the LCP metric or improve upon the LCP metric by reducing it down.

Yeah. So yeah, on that point, A, how do people get their hands on this?

How do they start using this and benefiting from this? And do they need to do anything in advance?

Do they need to configure anything or use specific browser versions upwards to kind of start to use this?

Yeah. So all the major browsers have been sending these signals.

They're kind of hard to see, because again, they're done at this layer that you might not see.

It should be a header soon in all of the major browsers, but Chrome sends it as this frame.

But even then, you're not going to see it unless you're looking like really down in like Wireshark level PCAPs or stuff.

So that's a bit annoying, but just take it for the word that those signals are already being sent.

And over the next few weeks, we're going to be effectively just rolling out support for consuming those signals and acting on them so that we're prioritizing everything on every webpage as best that Chrome or Firefox or Safari can tell us.

And then there's opportunities to tweak that further and make it better.

I mentioned the enhanced prioritization. That's a feature that's currently for pro and higher plans, I believe.

And so that can basically do some of the hard work for people just automatically.

But beyond that, as long as people have quick and HP3 feature enabled on their zone, then this will just become an automatic feature that's deployed for all.

Perfect. Perfect. Lovely.

Thank you very much for that. You answered all my questions. Cool. Moving on, last but by no means least, William, welcome.

Can you introduce yourself? Usual exercise, what's your role here at Cloudflare and what is it you're going to talk to us about today?

Yeah, absolutely. Hi there, Sam. Yeah, I'm William Woodhead. I'm an engineering manager at Cloudflare and I work on the speed team.

Yeah, I think the best way probably to describe the speed team is that we do a lot of different things, but we spend our time thinking about how Cloudflare interacts with the browser.

So we're thinking about how to understand and optimize the performance of Cloudflare sites for the browser primarily.

And yeah, for the last few months, I've mostly been occupied by the launch of our new product released today, Observatory, which is very exciting.

I think Matt Bullock is going to be talking about that later today.

So actually, I'm here to talk about the blog post that went live today around the Core Web Vitals and crucially, yeah, how they're being updated next year to include a brand new Core Web Vital, as John mentioned earlier, IMP or Interaction to NextPaint.

So what is, obvious question, right?

What is Interaction to NextPaint and why introduce a new one? Yeah, good question.

So I think the first thing is to step back and ask what are these Core Web Vitals?

LCP was mentioned earlier by Lucas, which is one of them. But yeah, essentially, the Core Web Vitals are a selection of Web Vitals, which are these metrics that Google and the project have come up with to help us understand the performance of websites.

And the most critical subset of these Web Vitals are the Core Web Vitals.

And they're the ones that have been deemed the most important ones that developers should be concerned about.

And as such, they actually can affect Google's SEO.

So if you are concerned about your search ranking, you need to make sure your Core Web Vitals are healthy because they're part of that search ranking algorithm.

So yeah, the three Core Web Vitals, so Term to First Byte, which John mentioned earlier, that's a Web Vital, but it isn't a Core Web Vital.

The Core Web Vitals are LCP, Largest Contentful Paint, which is about loading speed.

Then you've got First Input Delay, FID, which is about interactivity.

And then you've got Cumulative Layout Shift, or CLS, which is about visual stability, how much is the page moving around.

And FID, this metric, the middle Core Web Vital that measures interactivity, it actually turns out to be quite limited.

It can only really, it's First Input Delay, it can only measure the first interaction that occurs on a web page.

So let's say you've got a web page with a text editor on it and someone's putting in a number of keystrokes.

There can be hundreds of interactions.

If some of those middle interactions, 50 to 70 are really slow, First Input Delay doesn't capture that because it only looks at that very first interaction of the web page.

So this is what they're trying to tackle with this new metric called Interaction to Next Paint.

So what Interaction to Next Paint does, it actually accounts for all of the interactions in the lifecycle of a page, and then it reports the longest.

And it does a little bit of logic here to make sure it's not a huge outlier.

But essentially, you can think of it as, let's look at all the interactions that happen within a page, and then we're going to report the longest one of those.

And actually, the name of it is very descriptive.

It sounds quite obscure when you first hear it, Interaction to Next Paint, but that's actually exactly the duration that it reports.

It's between when you interact with the website, and when the browser does the next paint or shows the next screen.

So it's a period of time between the interaction and the next paint.

And if your website isn't configured correctly, this can be quite a long duration between those two things.

Yeah, and that's a good point. So this goes into Core Web Vitals, what, about a year from now, right?

We kind of anticipate like May 2024.

Exactly. So when this goes live, what do you kind of expect that to do to kind of performance scores of people's websites?

Yeah, I think because IMPs are getting deeper into the lifetime of a web page, there's ultimately, there's less places to hide, it's going to expose more websites for having more interactivity in that after the page is initially loaded.

So I think what we're going to see is web developers having to think a bit harder about how to optimize their sites for interactivity after page load.

And that, yeah, everyone's going to have this challenge at the same time.

So it's really in terms of when this Core Web Vital comes in.

So the web developers who think about it early start thinking about interactivity of the website, they're the ones who are going to benefit the most.

And how can we help there? How can Observatory help there? And the tools we're talking about this week, is there anything we can do to kind of help people get going on that?

Yeah, so optimizing for IMP, there's no silver bullet, it's, it comes down to a lot of complex pieces, you know, reducing the complexity of your pages, it comes down to making sure the main thread on the browser isn't clogged up.

But we have a few different ways which can help. But I mean, the main thing to mention is this new product Observatory that we've launched today, that's the first step that you got to be, you got to be getting a benchmark for your IMP.

An IMP really can only be reported from a RUM provider, where it actually gets data from real users and real browsers.

And that's what Observatory can offer.

So you got to set that up, start monitoring those pages, start recording your IMP.

And then Observatory can also give you recommendations for how to improve your performance.

The other big recommendation I'd give here would be Cloudflare Xeras.

What Xeras can do is it can offload all of your JavaScript or third party JavaScript from your browser to Cloudflare.

And that really frees up your websites to be just focusing on interactivity and not focusing on all of these event listeners that are that have been pushed there by third parties.

So yeah, have a look at Cloudflare Xeras.

Perfect, cool. I'm just keeping an eye on time.

Thanks. Thanks, everybody for everything you've spoken about today is really interesting.

Even I learned a lot and I've been involved in pretty much every one of these posts.

Join us in about an hour and a half to, as William said, talk about Observatory, which is our kind of big new product launch today that will help you get insights on pretty much everything we've spoken about today.

I'll be joined by Matt Bullock to discuss that and experiments as well.

So thank you very much for joining and I will speak to you later.

Thanks, guys.

Thumbnail image for video "Speed Week"

Speed Week
Relive Cloudflare's Speed Week with episodes showcasing how we keep everything fast, from lightning quick configuration updates and code deploys, to logs you don’t have to wait for, to ludicrously fast cache purges and real time analytics.
Watch more episodes