Latest from Product and Engineering
Presented by: Jen Taylor, Usman Muzaffar, Achiel van der Mandele
Originally aired on August 9, 2022 @ 12:00 PM - 12:30 PM EDT
Join Cloudflare's Head of Product, Jen Taylor and Head of Engineering, Usman Muzaffar, for a quick recap of everything that shipped in the last week. Covers both new features and enhancements on Cloudflare products and the technology under the hood.
English
Product
Transcript (Beta)
Okay. Hi, welcome to another issue of the latest from product and engineering from Cloudflare's product and engineering teams.
My name is Usman Muzaffar. I'm Cloudflare's head of engineering with me, Jen Taylor.
Say hi. Hi, I'm Jen Taylor, chief product officer at Cloudflare.
And we're really excited this week to have Achiel van der Mandele to join us.
Achiel is a product manager on Jen's team. Achiel, why don't you say hi and tell everyone what you're responsible for.
Sure.
Hi, thanks a lot for having me. I'm also psyched to chat with y'all. So I'm a product manager here at Cloudflare.
I've been here, I think a little over 18 months.
My focus area is very much on like the edge of Cloudflare. I always see a whole bunch of things, but very much like the first time a byte or a connection or something hits our network.
That's kind of where I try to focus. Edge of the edge, right?
Edge of the edge. So that's like HTTP and advanced protocols, but also like other non-website protocols, like FTP gaming, that type of stuff.
Yeah. So let's talk about that word protocols for a second.
Like what do we even mean by that? It keeps showing up all the time.
We keep saying protocols, protocols, and you go back to...
I think the first time I heard that word was when I was watching Star Wars as a little kid and C-3PO says, I'm a protocol droid.
It's this idea that when two parties are contacting, these are the rules.
That doesn't follow protocol. So why does the word protocol even show up all the time?
Why do we have a team called protocols at Cloudflare?
What's going on there, Achiel? The fun thing is we literally have a team called protocols that I happen to work with, but that's a great question.
So a lot of time when say we follow protocol, it's like, well, these are kind of like the steps that we are following to be able to achieve a task or do anything or interact with other people.
And it's very much the same on the engineering or Cloudflare side.
There are ways that your browser, your laptop, or whatever, needs to talk to a website.
And there are certain steps and certain contracts, if you will, or a protocol of how you interact with the website and retrieve that website from Cloudflare and have it show up in your browser.
It's really the order in which we say hello, right?
Literally. Hi, I'm a web browser and I would like some content.
And the server says, oh, that's nice to meet you, client.
Here's who I am. You can prove that I'm really this website. So yeah, that's really interesting.
And another team that you are responsible for that we're going to talk about is called Spectrum.
Tell us just a little bit of, again, by way of introduction, what is Spectrum?
Why do we call it that? Sure. So a lot of, most people know Cloudflare as an operator of like HEP services.
And when I say HEP services, I mean like mainly websites, right?
Most people know like, hey, I can go to Cloudflare.
I can put my website up on there. You do security and you do CDN and workers, all of that stuff.
The funny thing is the Internet is like a lot more than just that, right?
We all know, like you play video games, that doesn't go through a browser.
There's no HEP there. There's no website there, but you're still interacting with the Internet, right?
Or another thing is you might be transferring files through FTP or you're doing email.
All of those services want to get benefits on the security side, but also on the performance and reliability side.
That's where Spectrum comes in. Spectrum is essentially the way for you to put Cloudflare in front of those types of services.
And we offer stuff like DDoS protection, advanced firewall rules, but also allow you to like speed up those protocols by employing technologies like Argo smart routing.
Yeah.
Yeah. So it's interesting. Like the picture in my head is a stack, right? So like there's, you know, the lower layers of the network stack is literally physical wires connecting to each other.
Then one layer above that is, okay, two computers can talk to each other.
Then they can have IP addresses. So I've got a number on the Internet.
You have a number on it. And the top layers of the stack are those applications.
And what we build was a lot of stuff that's specific to websites. But then we have all this technology, like you said, about protecting websites and protecting against DDoS attacks, you know, protecting against malicious intrusions, protecting against bots.
And that stuff applies to anything on the Internet. So it's being able to apply all of those infrastructure products for security and reliability to more than just websites.
And so it's the whole spectrum. I the entire spectrum of applications on the net.
So and Jen, I think this is probably a question that I was going to ask you.
Like one of the things that we were asked to work on recently was regional services.
And, you know, just from the product point of view, like why, what is regional?
What's regional about? After all, it's a global Internet, like we've got points of presence everywhere.
So, you know, where is some of the requirements of regional coming from?
And what is it that the Keel and the engineering team have to work on?
Well, it's interesting you ask that, right?
Because we go back to like a Keel being responsible for the edge of the edge, right?
And basically, like the doormat to the front door of Cloudflare, if you think about it.
You know, one of the things that we're starting to see is that different parts of the industry, different parts of the market, specifically different regions have different requirements for how their traffic and the data around their traffic should be processed.
And in particular, you're hearing in markets like in Europe, where they want to just ensure that all of that traffic is only processed in Europe.
I can understand why they would want to do that, want to do it for regulatory reasons, privacy reasons, security reasons, you know, they have some very specific regulatory reasons why they want to do it.
Now, if you step back, and you're like, okay, that makes sense. But remember, Cloudflare isn't any cast network, which means that any, you know, you're, you're typically what ends up happening with Cloudflare traffic is it comes in the doormat, the front door of Cloudflare, and it's processed in the colo in which it is taken care of.
And then we send a bunch of information back to the central brain that lives in Portland.
The challenge that we pose to Keel is says, Keel, solve Europe's problem on our any cast network.
And so Keel, I pass the challenge off to you.
How did you tackle this? How did you frame the budget? How did you frame the problem?
So the problem, I think, originally was that we were increasingly seeing people asking about this, like, hey, where do you process?
Where is my data?
But the challenge here was that that seems like a very simple question, like, can you just process or do data stuff in this region?
But there's a lot of nuance into like, what does that mean?
Does that mean data can flow through us?
Does that mean we decrypt or apply this product or that product or store it on disk?
So from a personal point of view, I thought this was a really, really interesting challenge, not so much from the engineering point of view, but very much from the product point of view.
And I'll tell you why. These things mean very different things to different people.
And it's because a lot of this is up to like the interpreter, right?
We have certain laws and we have certain people who feel a little bit icky about these things.
But a lot of times they just say, like, make it local.
But then don't tell you exactly what that means. So we actually spent a lot of time.
Sounds like a job for a product manager. A lot of fuzziness. Big, important, poorly defined thing.
Call a product manager. And if you ask 10 different people how to solve this, you will get 10 different answers.
11 different answers.
Ultimately, what we just did is we're just putting up like straw men, like, hey, how would you feel if we approached it this way or that way?
And really trying to narrow down into what is processing to you and what does that mean to you?
How does that manifest in your daily life? And that's where the interpretation also came from, right?
A lot of people don't necessarily even care directly about GDPR, but they have, they've been forced to write stuff into their contracts that say certain things.
So a lot of this. Let's pause there for a second.
GDPR. All I know is that right around May of 2018, every single website I ever visited started, started giving me big warnings about cookies and big things that say to accept cookies, accept cookies.
So what, what would this in two sentences for everyone to know what's GDPR and how did that call come into this whole thing?
GDPR has a require, has a whole set of strict requirements surrounding like, where is data allowed to flow?
And who can look at that data and where do you apply processing or products?
Obviously with us being in any cast network that does raise certain questions surrounding how, how does that work?
How do you operate that? It's not as simple as saying, well, we only have one data center and one server that's handling your traffic.
In many ways, the whole point of the Internet and the whole point of Cloudflare is to make sure we process your request and we're, we will find a computer to process it.
Even if it means sending it, you know, far from where the eyeball is originally, that's literally the point.
And, and while we have, it's very interesting, right?
Because there's networks are highly aware of other networks they talk to.
They have no clue about what, where the political boundaries are that they are crossing.
They, they know about autonomous systems and they know about lands and they know about lands, but they have no clue where they are.
Totally not bound for countries in any way. And I think like in the absence of any definition of country or whatever, when we thought about building our network, the thing that we optimize for is process that information as quickly as possible.
So we optimized for speed and we're like, location is irrelevant. Speed is paramount.
But then we started hearing from customers that maybe, maybe we needed to add another, another, another piece into that equation.
Access to this whole puzzle.
It's three-dimensional. So what'd we do Akil? So in the end, what we discovered and like talking to a lot of customers and kind of like proposing things, one, most of them just ask for, can you do like a regional Anycast in a smaller area?
And it kind of, the issue with that is it's very antithetical to how Cloudflare operates.
And ultimately in my opinion, not exactly what customers really want, because you want that broad Anycast network, right?
Most of these people care a lot about DDoS protection.
The larger our network, the more we can mitigate. And that's extremely difficult in a smaller area rather than larger.
And what they end up really caring about is very much like, where is traffic decrypted?
Like the quote I always like to use, which is a verbatim quote from a customer.
It's like, we can slice and dice this in a million different ways.
As long as you can promise me that no machine outside of the EU will see a decrypted bank account from an HTTP request from one of my customers.
We're good. Everything else is I'll just moot. I just want you to be able to make that promise.
So we've delivered literally on that.
We use our global Anycast network, but we make sure that we don't decrypt all of that just flows through us and the Internet back to a data center inside the region of customer's choice.
We do EU and US, and we only decrypt and offer processing there.
That's great. Hold on a second. I want to double click on that, because you just said something that was really interesting to me, which was we wanted to preserve the power and the strength of what we do for Anycast, but also respect the kind of the regional processing decisions of our customers.
So how do we balance that?
When a customer has regional services turned on, what still happens in Anycolo?
And then at what point does the traffic get passed back? Why was regional Anycast such a bad idea?
So regional Anycast very much limits how much network capacity you have.
When you look at large scale DDoS volumetric attacks, those can spend many hundreds of gigabits.
You want as much network capacity to be able to disperse it across the globe and absorb it across the globe as possible.
Having a regional Anycast just very much limits what you can do there.
It's also a little bit trickier in that we have a round globe, so traffic naturally balances a little bit more nicely around there than if you have one region where all of the attacks potentially come from the outside.
But I guess part of it is too, right?
With DDoS, I guess you just don't have to decrypt. With DDoS, we're just seeing, we just look at the volume of the traffic and we're like, that is an unnecessarily large amount of traffic.
I'm going to handle and absorb in Singapore this huge glob of traffic and then pass the traffic back that's shuttled for Europe, back to Europe for the decryption.
Is that it? Exactly. The capacity aspect of it, another way of looking at it is from the OG layer model, and I'll try to quickly recap.
The three OG layers that we most often look at are three, four, and seven, with three and four being very much on the network and connection layer, and then layer seven being HTTP to a website decryption type of stuff.
If you look at the types of attacks that are very difficult to scale up, but also block, those are the layer three, layer four stuff.
We don't need to decrypt to be able to block that.
It's still just data. It's opaque. Let the power of the network absorb that, but when it comes time to actually open the envelope and look inside, let's make sure we do that part in the data center that matches where the customer wants the regional processing done.
Exactly. Awesome. Best of both worlds solution.
It's still a heck of a lot of work, right? Because it meant making sure that those private keys and the data needed to decrypt, basically the letter opener, is only available for these customers in the places where they need to be, and that's another part of it.
That's really great. That was one of the things we've done recently.
The other thing I wanted to ask you about, going back to that protocols part of your responsibility, is HTTP 3, which is pretty new.
HTTP 2 feels like it was relatively new, and now there's HTTP 3, and there was a little bit of a rename going on there, because at one point it was called QUIC, Q -U-I-C, which is a standard that Google and the ITF was working on.
What is Cloudflare's role with HTTP 3?
Hold on a second. I got a question. Before we even get there, if we have HTTP, why do we need a new version of it?
If we've got a protocol that works, why do we need new ones?
I'm going to make a better shot at answering that.
Why do we bother to upgrade protocols?
Why do we need 1, 2, 3? That feels like it's actually making things more complicated.
Is the complexity worth it? Why? That's a great question. I think when we go through the different creations, no one exactly knows what you're really going to run into when you develop a protocol, or just for simplicity's sake, we're going to implement it in a certain way.
One of the examples, or a big example of HTTP 3, which is something we can improve versus HTTP 2, is head-of-line blocking.
What does that mean? Normally, you can only send one resource at a time.
Your browser is talking to a server using a JPEG and an HTML file.
You're blocked. If one is slow for whatever reason, then everything breaks down.
With HTTP 3, because it's UDP -based, which is different than TCP, we can do them out of order.
If there's one thing that is blocked for whatever reason, the other resources can continue to send, which is vastly more efficient in terms of transferring large websites, which have a whole bunch of different files.
If you go to your browser right now, you open the network browser tab, there's dozens, hundreds often when you go to a website of all these different resources that transfer the wire.
It's great to be able to not be blocked on one, but be able to get multiple at the same time.
That makes sense. If I think about the websites that I used to look at in the dawn of the Internet, where it was just lots of text, black text on a white background and very simple design, and look at the websites that exist today with all sorts of rich graphics and stuff like that, we've increased the complexity of the page in order to improve the design.
What I'm hearing you say is we have to make the protocol smarter to make sure that that experience stays fast.
Exactly. When we talk about protocols, it's just being the rules by which two parties communicate.
We came up with this fantastic way for a browser to talk to a server.
It was simple, and it was general, and it's part of the reason the web took off.
As the conversations were having evolved, it became obvious how the language itself, how the protocol itself could evolve to become smarter.
Of course, it's got to be 100% backward compatible because there's still gazillions of devices and servers out there.
Really, it's almost you can think of HTTP3 and any protocol evolution as how do we tune the protocol for the way people are using it in the same way that human languages evolve to get smarter, and people develop jargon, people develop shorthands, and people develop more efficient ways of communicating using English.
This is protocols evolving so that they can be faster and more efficient at communicating.
That head of line blocking that Akil was talking about is a great example.
You mentioned something there that I think is actually really interesting.
Akil, I'd like to understand a little bit more.
Usman, you just used the word evolution a moment ago. Part of the way that protocols work is because I speak the same protocol that you speak, Usman.
Yes. If that's the case, how do we- Yes. Thank goodness. I'm so glad you speak Gen.
I sometimes speak Gen. Then how do we upgrade it? How do we decide that we're going to...
How do we actually get everybody to start speaking Usman? Who goes first?
How do you get everybody to start speaking Usman? Yes. Akil, how does that work?
How do you solve that? How do you solve that problem? It seems like we have to be talking to the major players on the Internet who control a big chunk of these standards and implementations and work through that low level of the stack.
Yes.
It's even more interesting because now we're talking about speaking Gen between two humans, but here we're talking also about very different parties that are focused on very different things.
To make that concrete, when you talk about the implementation of protocol like HTTP2, you need a client, which is generally a browser.
You need a server side component, which is a web server, maybe like NGINX, which is what we build our technology on.
That also maps to parties such as ourselves.
We operate a web server. Then there are parties that operate browsers, like Firefox and Google and Apple or Safari.
You need both to be able to do this.
Then it's an interesting question. It's a chicken-egg problem here. Could Koffler just invent HTTP4 and then hope that everyone follows it?
That would be interesting, but maybe those people aren't interested in supporting HTTP4.
There has to be a lot of collaboration.
That's where parties like the IETF, the Internet Engineering Task Force, where they set up these groups of folks, often with people from parties such as Koffler and Google and Apple.
They meet together and together they come to these standards.
They agree on the goals all these parties seem to achieve.
That gets you a good mix about things that web servers maybe care about, which might be efficiency, but also browsers, because browsers have vastly different opinions about how things should work.
They have a better feeling for, well, this user is on mobile, so he goes outside a lot, so he switches networks.
That gets you all sorts of new interesting problems to tackle.
All of those people bring all of those problems together, and that's when we start talking about new standards.
That's also the only way in which you can get people to literally agree on how to move forward.
I guess everybody pushes the button on the same day, right?
Where it's like, okay, on September 1st, we're going to push the button and I'll start speaking in response.
That's just it, right? It can't work that way.
It's got to be backward compatible, so it rolls out very slowly. Our servers have to be able to handle in all directions.
The cool thing is, because Cloudflare sits in front of so many things, once we get it right, all of our customers can pick up those benefits almost automatically.
It's so great for Cloudflare engineers who get to be part of these IETF conversations and sit on the committees that are literally designing the future.
It's very exciting work. I'm very excited about that aspect of being able to help out here, because nothing's more important for people than for browser implementers to have a server that's everywhere, like Cloudflare.
Many, many different websites that they can test against.
With us enabling HTTP3 on our end, all of a sudden we have, I think it's 200,000 domains right now that have HTTP3 enabled today.
That's amazing. Google and Mozilla and Safari, or if you want to roll your own Usman browser, you want to support HTTP3, you can test against those 200,000 websites.
That's amazing. That's so great.
It's so great. It makes the collaboration a virtuous cycle, because the more feedback they get, the faster the protocol can evolve.
It's really great. Hey, listen, one thing I wanted to ask you about, since we're always talking about speeds.
The simple answer for my parents, in case they're watching, at the end of the day, it's all speed.
The primary value of everything we just talked about for the last five minutes is making things faster.
Of course, everyone, even a child playing a video game on the Internet wants to know, how fast is my connection?
One of the other things we released was just a public tool called speedtest, speed .Cloudflare.com.
Let's talk a little bit about that. Why do we do that? What's it actually measuring?
Great question. Yeah, we launched speed.Cloudflare a few months ago, which is a new way of testing the speed of your Internet connection at home.
I think what we really wanted to do is we were looking at a number of these other speed tools, which are great, but they're often a little bit simple in that they give you one number.
If that's what you care about, that's totally fine. That's totally great.
We really wanted to give you a better, more exact insight into how your network is performing.
You'll see that when you compare our speed test to others, that you get fast, more metrics, more graphs, literally showing you how the different measurements, you can download the measurements.
We also show you stuff like latency and jitter, which is how much your latency changes over time, which people care about a lot for gaming.
Our MO was very much to offer people more detailed metrics on their network performance.
That's great. Part of our mission is to just give as much information as possible.
What are some of the things we learned as we built that?
This was really cool because for us, we just wanted to put this out there.
It gave us some metrics in terms of how fast different Internet providers were connecting.
On day one, when we launched, we noticed a lot of people saying, hey, this is not really great.
My upload speed isn't that high.
What's up with that? That was really great that it allowed us to look into it.
We noticed it for people that were on very, very fast connections. They would report vastly lower speeds.
We took that to engineering and asked, hey, what's up with this?
They looked into this and were like, that's interesting. If I go directly to the server, like circumventing CloudFlights fast, but through CloudFlights slow, that's not great.
We ultimately figured out that the default buffering, which is the rate at which NGINX, the web server that we build on, accepts data, had a suboptimal tuning performance.
We were able to change that to dynamically scale up, which immediately sped up upload speeds to the speeds people were actually expecting.
It also allowed us to figure out this bug, which has been around forever, and apply it to all of our customers.
All of our customers have fast upload speeds now.
I really love being able to build stuff on top of Cloudflare, really like dog food, like use your own software and really look at your own network and see where you can improve.
That's better, but also our customers happier.
There's a really famous quote that I've always loved. I remember hearing it when I was just starting out in this industry, which is, given enough eyeballs, all bugs are shallow.
That is exactly it. The more we embrace the community and be transparent and show everybody everything we're doing, the better it is.
The more information we have in Signal, we have to improve things and make things better.
It was great as well for us, because after we did this, we figured out how to fix this in NGINX, and we also happily give back to the community here.
We open-sourced a patch, and I believe it's under review with NGINX or F5.
That's really cool. It's part of helping to build a better Internet, learning at scale, leveraging the power of our own insights, and dog -fooding.
Those are all key tenets about how we think about and how we actually build product at Cloudflare, which is part of what makes being here so fun.
We only have a few minutes left, but I'm going to test your analogy generating facilities here, Akhil.
What's a spectrum port range? It's a pretty esoteric thing.
It's down in the details, but it really mattered to a bunch of our customers.
We shipped it about five or six weeks ago. What are spectrum port ranges, and why was support for spectrum port ranges something we could get away with not having for the first year and a half of spectrum?
We need to implement this, and we did.
What's up with this feature? Yeah, good question. Before I can answer that, I need to explain what a port is.
A port is something that's open on your web server or your server.
You can maybe look at it, if you're allowed to laugh about this analogy, like your house.
Normally, you have one door, but you might have multiple entrances.
Multiple doors. I love it. I was wondering what you'd go for.
I like it. I like it. It's working for me. It's working for me.
Behind every door, there's a different service. For instance, in a web server, a website LAN, we generally talk about port 80 and 443, which is how your browser talks to you.
Different protocols use different ports. A gaming server will use totally different ports than 80 or 443, or an FTP server.
Someone might have ever heard of putting 21, and then your mail server has ports, and everyone's talking about the challenges with configuring no clients.
With Spectrum, you can put Cloudflare in front of these various services.
You can put Cloudflare in front of a gaming server or a mail server, and most of those just operate one or two ports.
That's fine. You're just going to figure one, but what if you run a service that has dozens, hundreds, thousands of ports?
That becomes really cumbersome if you have to go to the UI and say, click at port 20,000, at port 20,001, 20,002.
The ballroom with 18 doors, 100 doors to the rest of the hotel. It's all the same service, but it has many different doors into it.
That's not great. We've had customers say, okay, we want to open a few hundred thousand ports across a bunch of IP ranges.
Then we had to unfortunately say, okay, that's a little bit clunky, but maybe you can build a script to do this.
They're like, oh, okay. They went back, and then two hours later, we get paged because our API limits are getting hit because they're like...
Because their script is killing the API. We build port ranges.
Yes. What you can now do, instead of having to put them in one port at a time, you can just specify ports.
You can say, I want ports 20,000 to 30,000, and we'll proxy all of those, which is really great.
Isn't this as simple as somewhere in the UI where you actually had a field where you entered a number.
Instead of entering a number, you can enter in 200-500, as opposed to just a single number.
Yes, that's exactly it. That's awesome. Protocols like FTP, or gaming, or video streaming, those are often protocols that very much care about this.
That's right. That's cool. Akil, we're at 29 minutes, I think. Jen, did you have a last- I was like, do you want to give one last teaser in the last minute you have of where are you going next, Akil?
Great, Akil. I'll try to keep this quick.
Yes, we're definitely always looking to support more protocols. One of the protocols that we might have heard a lot of people asking for is gRPC.
If you are interested in this, drop me a line at akilaCloudflare.com.
AkilaCloudflare. That's it.
That's the teaser. Thank you, Akil. It's so great having you. Jen, always, always a pleasure talking to you on a Friday afternoon about all the amazing stuff our team builds that we take credit for.
It's, Akil, awesome work, and we will see you again next week on Latest from Product Edge.
Thank you, everybody. Thank you, everyone.