Latest from Product and Engineering
Presented by: Jen Tyalor, Usman Muzaffar, Alex Krivit, Aki Shugaeva
Originally aired on April 1, 2024 @ 1:00 AM - 1:30 AM EDT
Join Cloudflare's Head of Product, Jen Taylor and Head of Engineering, Usman Muzaffar, for a quick recap of everything that shipped in the last week. Covers both new features and enhancements on Cloudflare products and the technology under the hood.
English
Product
Engineering
Transcript (Beta)
Hi, I'm Jen Taylor, Chief Product Officer at Cloudflare and I'm joined by... Hi Jen, I'm Usman Muzaffar, Chief...
What am I? I'm not Chief of anything. I'm Head of Engineering at Cloudflare.
It's nice to see you. I keep... I keep... You always introduce yourself with such smooth rhythm.
I keep wanting to... You're just blown away by my rhythm.
That's right. It's nice to see you, Jen. We haven't done that in a while.
I love that I keep that mystery and surprise alive, Usman. It's all the time we spend talking to each other.
Welcome to Product and Engineering. What's latest from Product and Engineering.
Everyone, we are joined today by our esteemed guests, the Cache Team.
Can you guys introduce yourselves? Sure. I'm Aki Shugaeva.
Oops. That is me. That is Jen and Usman. So, Aki Shugaeva, the Engineering Manager and Alex.
And I'm Alex Krivit. I'm the Product Manager for the Cache Team.
Awesome. Okay, so I'm going to dive right in. When we say Cache, are we talking C-A-S-H or...
Is that in charge of the money? You're where the money is? Yes. There's so many puns all the time.
All the time. Endless source of t-shirt headlines. It's great.
Exactly. We kind of revel in the fact that there's endless puns here. That is one of the perks of the job, for sure.
Glad you took the role. Yes, definitely.
No, we're talking C-A-C-H-E. And it is making temporary copies of specific types of website assets all across the Internet so that...
In our data centers all across the Internet so that these assets can be delivered to end users quickly and securely and easily.
So when most people think of a Cache, they think of like an auxiliary data store, right?
So like on a single server, you've got RAM for high-speed retrieval.
But content delivery network, to make it faster, we're trying to put our content closer to the client requesting the content.
Just to rephrase a little what Alex is saying.
Well, and now that we're in over 250 cities and I don't even know how many data centers, that's a lot of copies.
That sounds kind of complicated.
How do you do it? That's a good question. It's like as you sort of alluding to here is you need to be able to put the right content in the right data center where it's going to be the most useful to the people requesting it.
You don't want your copy of a picture of your vacation in Chicago when most of the people that are wanting to see your vacation photos are in Australia, in Sydney or something somewhere.
And so you frequently have to move the assets around based off of where they're requested.
And where they're not requested, you don't need a copy there.
It's sort of a waste of space both for us and a waste of time for the customer.
But then, I mean, how does that work? I mean, so do customers then have to kind of go in and like manually be like, no, don't send my vacation photos to Chicago, like only send them to Dubuque?
Or like how do you decide what controls to make available to which customers?
That's also sort of a good question. The customers don't have to go in and set those.
That's sort of all behind the scenes magic.
But they do have a number of other controls that they can set to make sure that the content that they do want cached and do want to be fast can be fast.
And so specific knobs can be like, hey, how long should we cache things?
What should we cache?
We have knobs for, hey, you know, this shouldn't be in cache anymore.
Let's go back to the origin server and get a new copy of this. That's known as purging.
You have other sorts of knobs like, hey, I want, you know, there to be maybe fewer requests coming back to my origin.
I want to be talking to fewer pops, which is like a tiered cache button that we recently released, which is pretty exciting.
Yeah, let's talk about tiered cache for a second. But first, I want to ask Aki something.
So, like, here's a question that comes up all the time when I explain what cache is, especially when I'm talking to non-technical people.
One of the first questions they ask is, wait, so doesn't that mean you need unlimited hard drive space?
Like, how do you make sure that you, like, how can you possibly keep a copy of everything in every data center?
So, how does it work, Aki? How do we make sure that when somebody requests something from a website and Cloudflare is in the middle of this conversation, right?
So, the request comes to us first. We go, like Alex was just saying, we go back to the origin, we get the content, we hang on to it.
What is it that makes us then decide, actually, we don't need this copy anymore. We can override it or we can let go of it.
Like, what is the engineering behind that? And how does that actually decide so that we don't wind up having to need, you know, a bazillion hard drives in every data center?
So, you know, the cache is filled on when a client requests content.
So, it's not all 200 data centers that are going to be requesting this content.
You know, so we're only filling the content in the data center closest to the eyeballs.
So, then, you know, how do we decide what to keep and what not to keep?
So, it depends on the popularity. So, we have an actually pretty simple eviction algorithm.
We just use the get rid of the least recently used.
And, you know, we just, you know, fill up as much as we can in each of the data centers.
So, Aki said the magic word, right? At least recently used, which is that the system is always, always caring about the thing that it needs right now.
And then if no one's asked for it in a while, it sort of goes away, which leads to an interesting problem, Alex, which I wanted to ask you about.
When I came to Cloudflare, one of the things we were very proud of was that we were expanding our network dramatically.
So, we're adding all these new data centers and all these far off locations.
And interestingly, every time we added a new data center, a lot of our customers were like, hey, you just overwhelmed our origin.
Like, all of a sudden, like everybody in that area, now that data center just went like, excuse me, Mr.
Customer, I need a full copy of everything because I see a lot of traffic coming in.
So, I need to go fully go to your origin and ask for almost all the pages.
And the customers were kind of like, guys, the reason we put Cloudflare in here is so that you don't have to keep coming back to origin.
And yet, as you grow your network, you are overwhelming us.
So, what's going on there, Alex? And what was the big feature that we built that helped us get around that problem?
So, that's where a couple of things sort of come into play.
So, Aki is saying that we fill cache based off of where requests are coming from.
And so, if you have a really popular asset that's maybe popular globally, they're coming to every Cloudflare data center.
And so, then if you have, you know, 200, 300 data centers all around the world, then they're all reaching back to your single origin for that copy of vacation photos again or something.
Assuming that, you know, everybody all around the world wants to see your vacation.
My vacations aren't that popular.
My vacations are. Jen's aren't that popular. Nail it. We build tiered cache just for Jen, just so we can make sure that we're protecting Jen's origin server.
Yeah. And so, instead of having all 300 of these data centers reaching out for vacation photos, instead you want to maybe organize these data centers into sort of a hierarchy so that maybe like certain data centers talk to other data centers that then can talk to an origin.
And so, you're just getting a subset of requests back to your origin from certain nodes that then can fan that information out and distribute it to other data centers.
And so, that relationship really helps to limit the number of data centers that have to ask the origin for that information.
So, it really reduces the number of requests back there. Well, that's one of the things I love about cache.
It's one of those amazing features that helps both end users and, you know, the sites and applications themselves, right?
You know, obviously we kind of hinted already that, you know, the cache, by caching stuff, it makes it faster to deliver it to the end user, like faster time to eyeball.
But it also enables like significant cost savings to the site and application owner because you don't have to go to the origin as much because every time you hit the origin, like it costs some money and it costs some time and it takes capacity and all that.
So, a really smart cache, like features like tiered cache, are like goodness in both directions, which is one of the great things about this product.
Everybody wins with cache. That's the kind of thing, right?
Actually, and so, this is great. So, like if we were just to humanize this for a second, this is literally the data center in Chicago going, shoot, I don't have that picture of Jen's surfboarding.
But before I go to, you know, jensvacationphotos.com to get that picture, let me check with San Francisco Pop because it might have the answer.
And then I don't need to bug anybody. I don't need to go to Jen's.
So, but that does mean that that Chicago origin needs to know to look at that.
And that the question of which data center to go to might be different customer by customer.
So, I think, you know, Aki, we call this topologies.
What are some of the features we've built here to give customers controls over how and where any data center knows where to go?
Like how is it supposed to know if I don't have it before I go to origin, where should I look?
So, recently we introduced, you know, a smarter tiered cache topology where, you know, we'll actually calculate the latencies between all of our data centers and our data centers origins and find out what's the closest one to the origin.
So, you know, we'll choose that as kind of like an upper tier.
So, Alex is kind of talking about, you know, different topologies.
We essentially just have two layers right now.
So, we've got, you know, all of our data centers. Then we have an extra layer that sits in front of our origin.
And that one can have, you know, we can have custom topologies.
You know, that one's going to be really our SEs. You know, it's fine-tuned to what customers need.
And then we have our smarter topology.
Right. And that gives us one upper tier, which would also shield the origin, you know, from all those 200 data connections, right?
Then we just have the one upper tier talking to that data center.
That's amazing because that means Cloudflare figured it out for you, right?
So, the smart topology, you didn't have to do anything.
So, it comes back to, again, how do we minimize the amount of work for the customer?
And yet, we know which upper tier we should ask without you telling us because we've got all the numbers and we're measuring which one's actually closer to your origin.
It's really, really important. And I think it also touches on something that I like about Cache that kind of consistently comes up is that Cache really helps make everybody feel smart, right?
And that, in my mind, is a big part of what you guys just delivered with early hints.
Alex, can you talk a little bit about early hints and, like, what's the problem it's solving?
Yeah. Early hints is sort of experimental status code.
And so, people sort of who may be looking into maybe their, you know, page inspector are pretty familiar with status codes.
You have things like 200s, which are like, okay, this was sent. You got the full response.
Everything worked well. You have other status codes like, you know, like maybe 301s, which are, hey, this has been redirected somewhere else.
And so, early hints is status code 103, and it's sort of fairly new, and it's just sort of gaining popularity.
And what happens with early hints is that a request comes in. We've talked a lot about requests here.
We'll continue on the theme of Jen's vacation photos.
Yes. And we know that maybe certain types of information on the web page for the vacation photos never really change.
And this information can be things like fonts for the page, maybe favicons, maybe a couple of images and things.
You can tag all of those different assets that say, hey, like, let's send these.
They don't frequently change. Let's send these early, while the origin server that we've also talked about is thinking about compiling the more full page body request.
And why that's sort of interesting is that there are a lot of pages around the Internet that take a long time to sort of fully compile the remainder of that page.
If you think about maybe shopping carts, you know, you're putting things into the shopping cart, you're taking things out.
All of that needs to be thought about and computed and compiled on the back end.
But nothing really happens on the browser until that full 200 response gets sent through.
What early hints promises is that it sends a lot of that non really changing information early so the browser can start working while the back end server is updating that shopping cart and doing all those things and sends that later.
And so I like to sort of think about it like multitasking for the Internet a little bit.
So when all this works perfectly, Alex, what's the benefit? What actually happens?
In an ideal situation, all the hints came in, and we were like, load this early and hang on a second, I'll be right here, like, what's the win here?
What does this look like when it works perfectly?
The win is that the page for the end user, the person who's looking at the photos or updating their shopping cart or whatever, they see content quick, much faster than if the browser would have to wait for the full response to be sent all at once.
Just by reordering things and hinting things in the right direction, we can actually change how fast the user experiences, which is such an important thing to be watching.
And a metric that people who are into this watch really, really rigorously.
And so it's amazing that early hints has shown some really tangible, measurable results here.
It's fantastic. Early promise.
And the whole thing, too, that's also cool about this is that it's early. You talked about it being an experimental status code.
So I think one of the things that some folks may not realize is kind of how important the development and the evolution of standards are in things like error codes and stuff like that when it comes to cache because of the importance of those standards for consistency of behavior across browsers.
Can you guys talk a little bit about kind of, I know we participate in a lot of that kind of early thinking and conversation with browsers.
How do those conversations go where do those where does that innovation come from and and stuff like that.
That's a good question. Also, a lot of it is based off of these standard setting bodies that frequently sort of bring a lot of people together who work on browsers who work on you know CDN who work with origin servers, and you sort of really get the notion when you're working with these standard setting bodies that really the Internet is kind of a democratic process, people sort of propose an idea, it gets debated by these people gets tried out and experimented with, and if it shows promise and a lot of people get on board with it then it becomes a standard, it becomes a thing that people are like okay, if we send this sort of status code or this thing like this is the expected behavior and this is how you should do deal with things and so it's sort of a cool thing.
And for early hints and these other things like this is part of that process it's, hey, this is a really good win for the Internet it can load things so much faster and, you know, you should support it like this a huge win for everybody, and this is the data behind it.
So great. Yeah, it's one of the things I love about CloudFlight right our mission is to help build a better Internet and a big part of what we do is, you know, we'll barrel headlong into early adoption of an emerging standard, because one of the biggest it's like it you know with standards it's like no no no after you know really know after you know really after you, and it really takes somebody to make that first move for the rest of the ecosystem to come and that we can safely try things at scale, which is another, which is another amazing thing.
Okay, what did what did what was in what did engineering have to do that what was some of the technical challenges we had to think through when we implement early.
The biggest technical challenges was that you know this is a new status code right so you know not all the browsers are going to support it not all the proxy servers are going to support it, you know, there is a ton of challenges, including like you know just like a simple Python request library that most developers just use.
Even know what a one on three is.
Not from up before I work. To make so. That's awesome. You're literally get to be part of the team that implements new HP status because it's a, you know, it's, it's such, it's such a, it's such an honor honestly to be to work on some of these.
Very exciting. Cool. And like actual clock time save like not theoretical like wall clock time saved on on on you know the Alex, do you have some stats on that I'm going to put you on the, on the spot with some numbers on the spot.
Yeah.
We have in sort of our tests have seen improvements of around 30% to what is called LCP which is essentially most of the content on a website has been loaded is what that means is largest content paint in a browser.
And the way to think of that, you know, just is this is a space that brilliant people have been working on for decades, and to squeeze 30% out.
This is that is no joke like that this is this, I would have been impressed if you'd said 3% and like 30% just just extraordinary.
Yeah, it's, it's cool.
Yeah, I mean, it's a, it's a thing where, you know, frequently page load is measured in, you know, milliseconds or something and, you know, frequently we've seen, you know, hundreds of milliseconds being shaved with, you know, early hints which is incredible.
This is really incredible. Yeah, I mean we're all very very impatient right so I forget.
Right. Yeah, we'll take I'm all impressed with that 30% today but you know by day after tomorrow that'll be okay that runs to zero what's next.
Yeah. So, another another important project that has the word hints on it is quite different which is crawler hints Alex talk about this, I'm just going to leave it leave it so I'll start with that what is crawler hints why did we attack this what is that what was the challenge.
Yeah, I'm crawler hints is a program that we're working with a number of people who run really really large Internet crawlers all around the globe.
People run these crawlers primarily because content on the Internet is frequently changing.
There's no real sort of like clearing house that tells these crawlers, hey, like you can expect that this content will be changed at this time and this time and this time.
And so, people who operate these large networks of crawlers have this task that's sort of impossible at their fingertips where they have to go back to try and find this new content, every so often but they don't really know when it's changing, so they have to keep going back sort of naively and saying like, Oh, is this the same.
How about now. That's sort of where we step it, then, because Cloudflare sits in the middle of you know people's origins that may be changing content and clients that may be interested in when this content has changed, we can send signals to people that operate these large search engine, or these large crawler networks and tell them, hey, from our perspective, this content has changed so you should go back and look at it now.
And so that's the idea behind the crawler hints is to build that crawler efficiency, make it a little bit better for everybody.
What does, what does it mean for us to notice the content has changed why is that something we're even able to do.
Well, you know, through all the data that runs through our network and there's a bunch of different signals that we can use and things we have access to such as like birdcache we have cache control headers and TTLs, you know we also have our purge system where people, you know there's an e commerce site that has a new story that just came out and they need to you know get all that content off our network right so they'll send a purge and they'll get rid of all that content.
You know, that's also a signal that you know a search engine can use as well to say hey we need to re index this page we need to break it.
So there's, so we're actually looking at multiple different signals of how content might have been changed.
Right.
Excellent. And we're also still trying to improve that you know as we're working with the different crawlers in the industry we're trying to take feedback and see, like, how can we keep make sure that they're getting the freshest content because content freshness is actually a very very, very hard problem to solve.
You can ask any of the teams that you know like for Google Bing or any of those places.
Well, and this again is like you know I talked a moment ago about one of the things I like about cash is that it's one of those like we're able to make things better for end users and for the people who own the sites in the applications and the infrastructure.
This is just another example of sort of sitting in this really valuable place and really kind of helping a broader ecosystem to like you know how do I make my search engine more efficient.
How do I reduce because it's like, you know, a search engine crawling your site is the same as an end user, you know, hitting you up and so if they're constantly hitting your origin.
It's going to cost you so you know this not only makes the life better for the crawler it's also going to save you money as a site you know the owner of a site or an application to just, you know, the cost of sort of managing and running your origin right I mean for engineers and product managers out there wherever there's polling there's an opportunity right so you've got the architecture of your system involves systems constantly asking the same question over and over again getting the same answer 99.99% of the time.
And the reason they keep asking is because there's the off chance I want to be the first to notice when something.
There it is there's your architectural opportunity make the system smarter have the information go in the other direction it's not always easy, but it can be transformative when it happens because it reduces the load for everyone.
But speaking of not easy, and the ability to look at and interpret powerful signals right I mean that's kind of the underlying sort of problem to solve with very support yeah.
Yeah, absolutely. That's definitely one of the most difficult parts of supporting that response header, which, again, is the going back to the standard setting things sort of across the Internet very is one of the more difficult ones if you're trying to think about caching things and then serving that content correctly to end users.
So let's motivate this a little bit so we're trying to we're trying to visit Jen's vacation website, and sometimes I'm visiting it on my awesome desktop with my 96 inch monitor and there's no such thing but let's suppose I have one.
And sometimes I'm just trying to hit it with my mobile phone with with a tiny little screen.
So, Alex, what is the role of the very VAR why what is the role of the very header why why is this an interesting thing to have that is even in the web in the Internet spec.
Yeah, if you're a person, if you're if you're Jen and you want to make sure that people are viewing your vacation photos in the best way possible.
You might want to have, you know, different variants of that image saved on your origin somewhere and so you might want to have an image that's optimized for your 90 inch, you know screen high resolution 4k colors pop in, you know, and then you might want to have a little dinky photo that somebody can view on their, you know, on their mobile phone somewhere.
And you might want to have things optimized for the capabilities of different browsers that you may have on your desktop or on your mobile phone, or, you know, on your PlayStation or something.
So the very header comes from the, from, from, from the, or from the, the origin saying I've got these different variants, you can pick from them.
And, and, and the client can know oh wait a minute I'm a dinky client I should get the small one versus I got, I got the, I want the, I want the, the, the giant poster of Jen paragliding or whatever.
You do. I think I want that poster for real. Giving season is coming this month, it might be coming to a mailbox near you.
Yes, this is a great background photo for my zoom calls.
Okay so Aki how, how do we implement this we got a cache that sits in the middle How's it supposed to know which version to pick up.
So, you know, we have like the dinky photo and your large 90, 90 inch photo right we also don't want to keep going back to the origin server right now we're requesting that downloading that content every time we get a different variant.
So in the cache you know we're catching all of those variants and kind of like grouping them together and that way we can serve the proper content for your dinky phone or the 96 inch monitor.
So, awesome. Yeah. And, and, and supporting this was was something we've been working on for a while this was it this was a tricky problem to get right and and make sure that we gave customers all the ability to test it and see it.
Yes, they are. Why was it so hard. Management. Why is that so hard.
Was just yes, you know, um, well, I don't know the deal but you know when we have content, we request it from large and we have a cash key right so usually it's a one to one mapping we've got one cash key that goes to you know the picture that Jen has from her vacation on origin, but you know introduced very, and then we have, you know, multiple different cash keys and so we want to be able to store this in a sufficient way right where we're not having to do a bunch of lookups all over the place.
So, one problem getting that grouping right. Yeah. Yeah, it's great.
Um, excellent. Um, so, uh, I always get such a kick out of talking to this team, this, the performance the optimization the technology, everything else.
I think, amazingly as always like the time just flew by. But I want to, I want to thank Alex and Aki thanks so much for coming on, on our podcast on our video podcast What is this, it's our show, and talking about all the great stuff.
Thanks for having us.
And I will make sure that I check in with you guys before I post the photos for my next vacation so I can really optimize my caching settings.
We're here, we're here to make it easier for you.
I appreciate that. Awesome. See you guys soon.
Thanks everyone for watching. Bye. Bye. Bye. Since they are trusting us with their personal information.
Believe it or not, we've actually had customers right in and tell us that they have gone into their browser and viewed the source code to the web page to find out what's happening with their personal information, twice in the last year that I can remember.
We came to work, and we couldn't work because Amazon was down. We couldn't log into our support panel.
We couldn't manage our shipments through our third party logistics provider, but our site was still working.
And being able to stay online through to Amazon downtime has been amazing.
In fact, there's some of the highest, highest sales days of the past year.
In terms of bandwidth savings, we have gotten amazing bandwidth savings from Cloudflare.
Over 95% of the bandwidth that we use is cached now.
Most of that are large static images, which are getting optimized through Mirage.
And so we know that they're just loading so quickly and the best that they possibly can.
Also, the web application firewall is really great because it allows us to make sure that people aren't compromising our system through any known attack vectors or browser vulnerabilities.
We're a really small engineering team.
We only have about one and a half technical people that write code on a day-to-day basis.
So anytime that we have the opportunity to use a service that reduces our need to write code, it really means a lot to us.
We've had zero security breaches the entire time that we've been online, and Cloudflare has been there with us every step of the way.
Microsoft Mechanics www.microsoft.com www .microsoft.com www.microsoft.com www .microsoft.com www.microsoft .com www.microsoft.com