🔒 Security Week Product Discussion: Network Performance Update: Security Week
Presented by: David Tuber, Alex Kuck
Originally aired on October 8, 2022 @ 2:00 AM - 2:30 AM EDT
Join Cloudflare's Product Management team to learn more about the products announced today during Security Week.
Read the blog posts:
- Cloudflare Observability
- Securing Cloudflare Using Cloudflare
- Using Cloudflare One to Secure IoT Devices
- Application security: Cloudflare’s view
- Domain Scoped Roles - Early Access
- Commitment to Customer Security
Tune in daily for more Security Week at Cloudflare!
SecurityWeek
English
Security Week
Transcript (Beta)
Hello and welcome to Cloudflare TV. My name's David Tuber, I go by Tubes.
And today we're going to be talking about the network performance update just in time for Security Week.
With me here is my good friend, Alex Kuck.
Alex, wanna introduce yourself?
Yeah, good to meet everyone.
I'm a solutions engineer here at Cloudflare and I work with Tubes on performance stuff.
So take it away, Tubes.
Yeah.
So the key thing, so basically the key focus of the performance team is exactly what you would expect.
Our job is to be fast, being as fast as possible, and that means finding places where we're slow and fixing them.
And that happens both on machines and in code, and it also happens on the network.
And the network is really where we're going to be spending a lot of focus on this blog and talking about the places on the network where we optimized ourselves to be faster than everyone else.
So Tubes, tell me a little bit about this blog post, the thesis of this blog post, and then we'll go into a little bit about how we took these measurements themselves.
So, what is the purpose of this blog post and why do we put it out?
That's a good question.
So part of the Cloudflare mission is to build better Internet, help build better Internet, and a better Internet is a faster Internet, believe it or not.
A lot of data around how performance impacts application performance and application usage even.
So Google found something super interesting that if you induce even a second delay on search results, it decreases search results by 0.2%, which doesn't really sound like that much, but for Google, 0.2% it's huge.
It's like millions and millions of users.
Amazon found that for every 100 milliseconds in latency, it corresponds to roughly 1% of their global revenue.
So Amazon's huge.
And this was back in '07 or, sorry, '09 when Amazon was still smaller.
But even in finance today, finance you have applications that need to be super-duper fast.
You need to be able to make trades literally almost even faster than real time. And there's been some studies done, and they say that applications that have even four milliseconds latency difference, or four milliseconds slower than their competition, can sacrifice hundreds of millions of dollars in trades.
So latency matters and it corresponds to real money and it corresponds to real user experience.
I mean, like, you know, and everyone knows, when you use the internet, you want it to be real-time, you want it to respond with what you're thinking.
And that can't happen if applications are slow.
And applications are hosted in the cloud, so it's super important for the cloud to be as close to you as possible because at the end of the day, you can't fight physics.
The closer you are to a user, the faster your experience is going to be, assuming that everything on the machines is copacetic and about equal.
And so one of the things that we wanted to do and we started doing, Speed Week of last year, which was, I think, end of July, was we went out and we, actually the end of August, we went out and we said, "Hey, you know, like we're the fastest network in the world.
Why don't we prove it? Why don't we build a system that allows us to measure ourselves and everybody else from the last mile from the users and get real user measurements on comparisons between us and our provider, our competitors." And that's what we did and that's what we built.
And that framework and that system is how we measure ourselves for these blog posts.
And our goal is to just get constantly, constantly faster. So like we were, we were 49, we were fastest in 49% of networks around the world and then in Speed Week last year, you know, we're growing and we're getting faster.
Now we're up to 71% as of this blog post, which is great.
So, you know, we're not satisfied there.
We're going to continue to be faster until, we're going to continue iterating and making changes until we're faster than 100% of networks.
So why are we talking about our network during Security Week?
Maybe could you talk about the interplay between those two?
That's a really good question.
You know, Cloudflare's a global network. We like to say we run everything on every machine.
And so that means that, you know, these security products and these awesome security innovations that we've talked about in Security Week all this week, they extend out on the very edge of our network.
And so, you know, the performance of the network directly correlates to the performance of these security systems.
And, you know, you may not even think about that, right?
Like a lot of times with security, performance is kind of an afterthought. Like, I just need it to protect me against threat X.
I don't care if it does it well or not.
But you know, if you could have security and performance in the same package, would you take it?
And the answer is yes, every single time, because as we said before, you've got to have security and you also can't lose our money.
So security and performance are both kind of intertwined in the amount and the ability for providers to deliver a good service to their customers.
Awesome.
Awesome. So you touched on this a little bit, but maybe if you could double click into our methodology for how we collect some of these metrics.
It's a really good question and let's talk about it because we bring it up in every blog post.
It's always a good idea to kind of do a reminder.
So the way this works is we have 27 million Internet entities that use Cloudflare today, and a number of them are free.
So basically what it means is it's free to host a domain on Cloudflare today.
You can go to the Cloudflare Dashboard, sign up.
It doesn't cost any money.
If you have a site, you can host it on Cloudflare absolutely free.
You get free service.
One of the things that we like to do with our free... we love to use our free sites to help provide early feedback into systems and products that can help make the Internet better.
And performance is a really good place we can do this.
And so one of the things that we do is we take all of our free sites and we say for every free site, every time we would serve a Cloudflare Error page, we host a series of 100 kilobyte images on that error page.
And so basically, the user will connect to Cloudflare or a user will connect to the site that's hosted on Cloudflare and it'll screw up, right?
Like there will be an error that is hosted.
There will be an error with the site and it's a very Cloudflare-specific error code.
So 1000 status codes are higher.
Only Cloudflare uses those and nobody else uses those.
If a user would be served those as part of loading that error page, they reach out to each one of our competitors and fetch that same 100-kilobyte file from each of them.
And then, we use the Resource Timings API that's coded and built into pretty much every browser, and we log how fast it takes for each of them and then we upload that to our resource-timing endpoints and we log it, we aggregate it and we group it by network and we group it by country.
And based off of that, we get a pretty good understanding of what a user is experiencing at any given time.
So we ran those tests recently and we're actually, we're always updating those tests.
And so in our blog post for this network update, maybe could you share some of the data that we found around kind of Cloudflare's performance compared to other providers?
Absolutely.
So one of the things that we found is, so as I mentioned before, initially we were faster in 49% of networks around the world.
As of this writing, we are faster in 71% of networks around the world.
So it's a big jump and we've been making jumps progressively.
And you can read back in each of our network performance update logs and you can see kind of the progression that we've gone through.
From Speed Week of last year to Birthday Week to Full Stack Week and then now to Security Week.
We've definitely made a lot of progress and we're going to continue. Let's see, other stuff like that.
We are, we're faster in, I think, out of the top 1000 most reported networks.
So basically, if you look back at our methodology, out of the 1000 networks that are reported the most, out of all of these, we are fastest in about 596 of those.
So that puts us around 71%.
So definitely up 17 or so I think from 17 ish or so from the 575 that we reported at the end of Full Stack Week.
So definitely making progress there and we're definitely continuing to iterate in finding all these things.
And you know, and I think the hardest part about all of this is not that, you know, like finding these new problems, it's that networks and the situation in the networks and the state of the Internet is dynamic, right?
Like we will make a change at time A and we'll be faster, and then at time B, you know, a network operator can make a change that will either undo that change or make something worse.
So it's constantly, it's all about constantly monitoring and looking at these graphs, as you said, as we're doing to kind of identifying, hey, where are we slow and making sure that we fix them, fix the places that we're slow.
And a lot of that is really just about extending our most pure network out to as many networks as possible, because direct peering with our networks, direct peering and interconnection with other networks really drives performance over those networks.
And it's really, there's a lot of really good examples over the past couple of network performance updates, and this is no exception, where we found another network where we were, we were connecting to out-of-country, we peered with them in-country and we jumped from fifth out of the five providers to first.
So the ability for us to peer really, really helps and our open peering policy, really, really helps us get as fast as possible.
And that's kind of, it's the not-so-secret sauce there that, you know, definitely, just trying to get as interconnected as we possibly can.
And that example you just referenced in our blog post, I believe that's keeping traffic in Australia.
Is that the use case you were talking about? Yeah, exactly.
So this user, so basically this network, we are reaching them in Singapore and Australia to Singapore is this far, especially if you're in a place like Sydney or Melbourne, which is literally you're going all the way through the country, out and then to a completely, through, across an ocean and then to Singapore and it added a bunch of latency.
But if you peer directly in Singapore or Melbourne, Singapore, or if you peer directly in Melbourne or Sydney, Melbourne and Sydney have almost instant latency.
And even the worst scenarios like Perth, which is on the other side of the country, is still significantly faster because you're traversing over land and they have cables that run over land.
So, a lot better network performance.
And that catapulted us number one.
And the other example referenced by the Performance Update blog post was about some congestion that was happening in Canada.
Would you mind explaining that one a little bit?
Yeah, sure.
I think the way that you think about this is that when you connect to the Internet, there's a couple different ways you can do it.
You can connect to the Internet through, you know, you connect to the Internet through your ISP, but the ISP connects to the Internet through a couple of different ways.
It can connect, it can connect to providers directly, and we call that a private network interconnect, or a PNI.
It can connect to a transit network, which is a dedicated network whose job it is to interconnect other networks for a fee.
Or it can connect at an Internet exchange, which is basically just kind of a central point where you can plug in and everyone else plugs in and they all share.
We all share peering information.
It's a really good way to interconnect with other networks pretty cheaply because you buy a port in the IX and everyone else buys a port and the IX and everyone exchanges peering details over that port.
Or you can peer with what's called a route server at the IX, which basically aggregates all of the peering sessions together.
And it's really easy and a cool way to do that, to get interconnected.
That's kind of, IXs drive a lot of our peering because we're present at almost every IX in the world and we get a lot of peering requests from networks that say, "Hey, you should interconnect with us because we just joined IX in Toronto or we just joined the IX in London," or something like that.
And interconnecting with those networks means that we exchange traffic directly over that session, that BGP session that we spin up with them, and it becomes essentially a direct routing.
It essentially becomes a direct path for that network to Cloudflare or for Cloudflare to that network.
And so what happened is, we were connected to it.
We were advertising routes for customers.
And we do that as part of Magic Transit.
And what happened was that customer was having trouble connecting back to Cloudflare.
And the reason that they were doing this actually had nothing to do with Cloudflare and had everything to do with the network provider that they were using to connect back to kind of the Internet at large.
And it basically was because they were congesting, they were using so much traffic through their network provider that their network provider was kind of punting them all over Canada.
And so traffic that was coming in, traffic that was supposed to come in in Vancouver, was going to Toronto.
It was just really, really bad for them.
So they asked us, is there anything you can do?
And so in, kind of in parallel, we're always looking to improve our peering situations and we're always looking to improve the amount of traffic that we can send over Internet Exchanges and with a whole bunch of different Internet providers.
And it turns out that we actually had work in flight to upgrade our interconnection with this customer's Internet provider.
So our infrastructure team reached out to them and said, hey, you know, this customer, they're seeing a lot of disconnects.
They're seeing a lot of packet loss.
Is there anything you can do there?
We worked with them.
We got a bunch of additional...
We got a bunch of additional private network interconnects directly in Toronto and Montreal.
And what that did was that basically gave, created bigger pipes so that this customer could send more, we could send this customer more traffic and this customer could send out more traffic to the Internet.
The provider didn't have to start sending everything over to Vancouver.
They didn't have to send stuff halfway across Canada to get service.
And they really improved their own Internet situation and actually doing so, and these port upgrades that we were already in the process of doing actually made us faster on these networks anyways.
So it's a really good example of these kind of customer-focused escalations, how they can benefit and kind of the vice versa.
You know, upgrading these ports and doing all of this work to improve our infrastructure can definitely help our customers get better experience as well.
Cool.
Cool. So I think that covers most of the content that was in the blog post. But I have a follow-up question actually on the customer who is looking for traffic improvements in Canada and this, you wrote, was a Magic Transit customer.
Are there any other recent updates around like Magic Transit performance products that we could talk about, like maybe like Argo for packets?
- That would be an interesting topic.
- Yeah, so we talked about Argo for packets at Speed Week and I think we also talked about Argo for packets at CIO week.
So definitely something, Argo for packets.
If you're a Magic Transit customer and you're seeing performance issues, definitely consider Argo for packets.
It can improve your performance by up to 10% or by at least 10%.
So definitely a really good, easy way to just kind of get 10% boost for almost no work on your side, which is really great, definitely being able to press that button.
And also, looking out for, Mike Conlow published a really great blog earlier today about how we're expanding our backbone and adding more locations.
This is a really another great example and we just want to like advertise that blog if you haven't read that as well.
16 new cities, I think 16.
It might be 18, even better.
- We're always adding..
- 18. -It's 18?
-Yeah. Even better, 18 new cities.
There you go.
That's why you're the brains and I'm just the guy sitting here talking about stuff.
18 new cities, including additional backbone links.
And Nitin and the infrastructure team wrote heavily about our backbone and how it improves performance.
And so adding the backbone and adding new locations definitely is a really great way we can get more direct interconnection and more points of presence that lead to better latency and better performance.
So, you know, we say 250 plus pops, but that's growing all the time.
And you know, like we release that on a quarterly basis.
It's going to be even more next time we release it because of the work that we're doing here and because we're striving to be the most interconnected network we possibly can.
Could you tell me a little bit more about Cloudflare's backbone and maybe if you could give it like, explain it like I'm five?
Overview and how Cloudflare uses our backbone to help expedite some of our customers' traffic?
Sure.
It's actually, we got, we got 12 minutes to spare. All right.
So we talked before about there's different ways that your ISP can connect to the Internet.
And one of the ways is what's called a transit network. And the transit network doesn't actually deliver services to customers in general.
There are some transits that do, and they offer that as part of a different package of services.
So, for example, Lumen and Cogent are both great examples of transits who offer dedicated Internet services.
And you can buy those.
You can buy cogent for, I believe, enterprises and Lumen offers a consumer Internet service under the name of CenturyLink.
So these...
but in general, these transits, what they do is they have these huge pipes that just connect lots of different cities in the world.
And what you as an ISP can do, is you can call up Lumen or Cogent and you can say, "Hey, I want to get access to the Internet and I'm peered in these local IXs, but I also want transit, I want you to advertise my routes for me and for my users so they can get out to the Internet if they need.
And these transit networks, they charge a fee for that because it costs money and they kind of carry, they basically kind of act as a provider and basically say, hey, I know the best way to get to user X.
So let's say that I'm on my ISP...
let's say my ISP is Comcast and Comcast is interconnected with Cogent and I'm talking to a site that's hosted in Google.
And so I connect to Google.
I tell my ISP, "Hey, I need to go to Google." And Comcast says, "Okay, well I don't have a direct connection to Google, but Cogent does, so I'm going to give this traffic to Cogent and Cogent's going to give it to Amazon or sorry, not Amazon, Google." And then it gets to Google and then Google says, "Okay, I have responded to this user.
How do I send it back?" And then Cogent says, "Well, I know how to get back to Comcast, so give me this packet and I will send it the right way." And this routing is called BGP Routing, Border Gateway Protocol.
It's basically the way that we find out where all of, it's basically kind of like the zip codes for the world if you're using the Postal Service terminology.
So a transit network basically kind of serves as like the highways between the different zip codes and a private backbone operates much in the same way.
So Cloudflare's private backbone is basically just a series of interconnected links between all of our locations.
And instead of giving our traffic to Cogent and saying, "Hey, I need..." If a user connects to Cloudflare, and then Cloudflare could give the traffic to Cogent if it needs to go to Google, we could instead say, "Well, I am in Boston and I need to go to Google and Google is in Ashburn, Virginia, and Cloudflare is connected to Google in Ashburn, Virginia.
So I should just send my traffic to Cloudflare in Ashburn, Virginia, and then Ashburn, Virginia, Cloudflare in Ashburn, Virginia will give it to Google." And that uses something called the backbone.
And the way that that works is that the backbone basically says, "I can send traffic to certain locations through certain locations better than other ways, other methodologies." And so the way that this works is this works with something called local preference of BGP.
So if I'm connecting in, let's say New York and New York has two paths to Google, one over Cogent and one over our backbone, we can say we'd much rather have our traffic over our backbone because we own the path, we can control what traffic goes on it, we can make sure that your traffic gets there faster.
And that's not a knock on Cogent, right?
Like Cogent does, Cogent sends traffic very well.
But it's a lot easier to say, to make statements about your traffic when you have full control over it, so we can say, "Hey, we'd much rather send this traffic over our backbone and then it will go to Google in Ashburn and then come back over our backbone and then out to a user in New York." And we basically can tell our routers, if the router is saying, "Hey, I'm sending it to Google," instead of saying, "Hey, I want to use Cogent," say, "Hey, I'd rather use our backbone." And our backbone offers kind of dedicated private connectivity throughout the network.
And it doesn't really function like a transit network. Like we can't, we don't provide Internet services, but we do do things like we send logs and all of your log streams go over that, we do use, we use it to...
Argo Smart Routing will leverage it because it's a faster path.
So Argo Smart Routing leverages the backbone.
Magic Transit and Magic WAN leverage the backbone, especially combined with Argo for packets, and our dedicated...
our cash poll, our origin fetches, they'll, or that traffic, especially if it's using Argo, but even not.
Our tiered cache population will use the backbone between upper tiers and lower tiers to make sure that the lower tiers have the right traffic.
Because we own the pipes, bandwidth is essentially...
we don't pay for it. So it means that we get more there.
So a user wants to use the backbone because we have better control over what goes over those links, which means that we can better guarantee, we can better guarantee that it's faster.
So if you want to use the backbone, if all of this sounds good to you, then we highly recommend you look at our Argo products, right?
Our Argo products are all about finding the fastest path between point A and point B and oftentimes that is the backbone.
Right?
Like at the end of the day, whether you use the backbone or not doesn't really matter because Argo products are encrypted at every level, at every level of the application stack.
So we're really good at making sure that your traffic is secure.
It's all about finding the faster path.
And if the backbone is preferred or if the backbone is faster than transit, then that's what we'll use.
And most of the time it is, not all the time. But we work really hard to make sure that that's the case.
And so by adding all of these additional backbone layers, backbone links, we can connect more of the world and have it use the backbone, which provides an alternative path for Argo to use to optimize your traffic.
Cool.
Cool. So one of my motivations for asking you these questions is that when I talk to customers, right, and whether it's just a firewall product or if they're just looking at specific security features, performance always comes up in conversation.
So, it's always...
so thank you for kind of walking through a little bit on the work behind our edge firewall, also has a lot of work around our network going into it and to kind of a 1/2 combination of providing both security without a cost of performance to our customers.
So I've asked you a bunch of questions. Do you have any points that you want to bring up?
Yeah.
Well, I'd love to just mention, because it's super important and you brought this up, that like all of these security products are built on top of our edge.
So you will never have to answer, you won't ever have to worry about the question like, "Hey, like this blog post was released.
Does this impact my security products? Does this impact my network services products?" The answer's yes.
We run everything everywhere.
And that allows us to and that allows these services to take full advantage of this.
If we put in points of presence in 18 new cities, we're going to have machines there.
And those machines will be better connected locally to you so that your traffic goes through the firewall faster.
And we know this because we know that reducing the amount of time spent on the last mile and being better able to control what happens on the Internet means that we can make things faster.
So we know that that happens and we know that the end-to-end time gets faster when we do this.
So your firewall gets farther and farther out on the Internet, basically towards right exactly where your users are.
And that's all, that's exactly what all of this network expansion and all of this network performance stuff is all about, right?
The network expansion is basically, "Hey, let's get into as many of these places as possible." And then the network performance side is like, "Hey, this is how that impacts how fast it takes to connect to Cloudflare." Because if you connect to Cloudflare, you get access to all of these amazing security and network services and right at your doorstep.
And with Cloudflare for Offices, you can even get these right in your office buildings if that's something that you so choose.
So we're all about kind of extending the last mile or shortening the last mile to be essentially you connecting to a box in your building, even as far as we can say that.
So, definitely a lot of interconnection.
Interconnection definitely helps expand and make us faster and all of the work that we're doing, even around automating our peering and making sure that we're always peered correctly and even towards know, expanding our network.
All of this is a lot of work and we publish a lot of literature on it and it's all about making your experience better and faster.
Awesome.
Okay, well, I think that brings us right to the end.
It was good talking with you Tubes this morning and I hope everyone watching has a good rest of the week.
So, - Thank you.
- Thank you so much. Have a great day everyone.
Bye. Optimizely is the world's leading experimentation platform.
Our customers come to Optimizely, quite frankly, to grow their business.
They are able to test all of their assumptions and make more decisions based on insights and data.
We serve some of the largest enterprises in the world, and those enterprises have quite high standards for the scalability and performance of the products that Optimizely is bringing into their organization.
We have a JavaScript snippet that goes on customers' websites that executes all the experiments that they have configured, all the changes that they have configured for any of the experiments.
That JavaScript takes time to download, to parse and also to execute, and so customers have become increasingly performance-conscious.
The reason we partnered with Cloudflare is to improve the performance aspects of some of our core experimentation products.
We needed a way to push this type of decision-making and computation out to the edge.
And Workers ultimately surfaced as the no-brainer tool of choice there.
Once we started using Workers, it was really fast to get up to speed.
It was like, "Oh, I can just go into this playground and write JavaScript, which I totally know how to do," and then it just works.
So that was pretty cool.
Our customers will be able to run 10x, 100x the number of experiments and from our perspective, that ultimately means they'll get more value out of it.
And the business impact for our bottom line and our top line will also start to mirror that as well.
Workers has allowed us to accelerate our product velocity around performance innovation, which I'm very excited about, but that's just the beginning.
There's a lot that Cloudflare is doing from a technology perspective that we're really excited to partner on so that we can bring our innovation to market faster.