This Week in Net: How we build stuff, privacy, security and “May the 4th”
Welcome to our weekly review of stories from our blog and other sources, covering a range of topics from product announcements, tools and features to disruptions on the Internet. João Tomé is joined by our CTO, John Graham-Cumming.
In this week's program, we go over blog posts related to how we built Network Analytics v2, which is a fundamental redesign of the backend systems that power Network Analytics. We give some “May the 4th” trends for the Star Wars fans, and go over how we are building proxy applications on top of Oxy — our modern proxy framework, developed using the Rust programming language — that must be able to handle a huge amount of traffic. There’s privacy highlights celebrating Australia’s Privacy Awareness Week 2023, speed and performance data in LATAM, a new DDoS amplification vector, Sudan Internet patterns during the ongoing conflict, and we welcome our new Chief Security Officer, Grant Bourzikas. Almost here, is our Developer Week.
At the end, we have in our new “Around NET” short segment, Derek Chamorro (based in Austin, Texas — from the Hardware Keys team), after his attendance at the RSA Conference in San Francisco.
Related blog posts:
- Cloudflare is faster than Netskope and Zscaler across LATAM
- How we built Network Analytics v2
- Effects of the conflict in Sudan on Internet patterns
- Celebrating Australia’s Privacy Awareness Week 2023
- SLP: a new DDoS amplification vector in the wild
- Why I joined Cloudflare as Chief Security Officer
- Secure by default: recommendations from the CISA’s newest guide, and how Cloudflare follows these principles to keep you secure
- Oxy: Fish/Bumblebee/Splicer subsystems to improve reliability
Transcript (Beta)
Hello and welcome to This Week in Net, everyone. It's the May the 5th, 2023 edition.
Yesterday was May the 4th, for those that are Star Wars fans. I'm João Tomé, based in Lisbon, and with me I have, as usual, our CDO, John Graham-Cumming.
Hello, John, how are you?
Good afternoon. Good morning or good evening, depending on where you are.
It has been two weeks. We have a few blog posts to mention, some related to speed.
Speed is something that we discuss frequently. By the way, are you a Star Wars fan?
I would say I'm a Star Wars fan, yeah, definitely.
I mean, I'm old enough to have seen episode four, the original Star Wars, in the cinema when it came out.
So I do remember being very excited about seeing it.
Obviously, I was considerably younger than I am right now. But yes, yesterday was the 4th of May, which I guess some people love to say May the 4th be with you.
But you say it had an effect on traffic, or at least DNS traffic, for Star Wars-related domains, right?
It has, it had. I was checking some data from our quad one, 1.1.1, and we could see some effect to those websites, domains, Lucasfilm, things like that.
Here it is. So this is a daily queries chart. So it shows aggregation or granularity by day.
And there's a clear 50% increase compared with the previous week in terms of DNS traffic yesterday, May the 4th.
Some of those trends usually make sense, but it's always interesting to see how traffic is impacted in a way.
There you go. So people were clearly typing in Star Wars or Googling things yesterday, looking things up on Wikipedia.
This chart doesn't show, but there's two spikes in terms of hours of the day.
That's 1pm UTC time and 6pm UTC time.
So those were the two magic hours for domains related to Star Wars that had the major spikes in traffic.
There you go. A fun thing. Actually, just before we move on to a proper blog post, we, about Star Wars, last year we did like, I really enjoyed, Matthew did a blog post, birthday week blog post related to Star Wars with a lot of comparisons between return in terms of Internet and different episodes, especially the three first ones that were launched.
That was a great comparison that those who love Star Wars, I advise you to search for.
I've forgotten that blog post.
We should, we should look at that one again. So sure we should.
Yeah. I can possibly show here. Here it is. Oh, it was the founder's letter.
Oh yes. Yes. Yes. Yes. Yes. I can do this now. And glimmers of hope. Yeah. It's all about the Internet, the Internet's current battleground and a new hope.
Those are the impersonation rack. I really advise those. All right. We should go back and read that again.
Last year's founder's letter. There you go. Founder's letter 2022.
Oh, well, we've had a few blog posts this past two weeks. Where should we start?
Can we start with the oxy one? There was one about oxy, which was in there, this one.
So the, this is part of an ongoing sequence of blog posts about a thing called oxy, which is a proxy framework that we've built.
And it is underlying quite a lot of different bits of technology and things that you're using today.
If you're a Cloudflare customer or someone who's using a Cloudflare website or something like warp or something like iCloud private relay, it's worth looking at the whole sequence.
If you're interested in how we, how we rewrote this stuff on this particular one, it looks at how they split a big process up into many smaller, specialized services.
It starts talking about the architecture, but the entire series is really interesting because this oxy framework has become a very big part of how we handle traffic.
Because you think right back to the beginning of Cloudflare, we basically handled DNS and we handled HTTP.
And honestly, we handled DNS as the authoritative.
So we were just providing DNS answers to authoritative queries and we were providing HTTP or HTTPS proxying.
Over time, things have got a lot more complicated.
We proxy arbitrary protocols over TCP and UDP.
We handle things like quick, we handle things like warp, which is based on a wire guard.
We are one of the service providers for iCloud private relay, and there's all sorts of different sorts of traffic we have to handle.
And so it became necessary to build a rather flexible Swiss army knife type proxy framework.
And that's what oxy is all about. And there are many more blog posts coming on this over time as well.
That's a good tag for you to search. There's a lot to discover and see in terms of deep dives here.
Yeah, absolutely. Tons of tons of stuff there about oxy.
So that's a good, that's a good place for us to begin. And if you're into nerdy details, there are nerdy details to be had there.
So where should we go next?
Let's stay nerdy. Let's go to Network Analytics V2. How we build it, right.
So there's a previous blog post about announcing the product, which is our Network Analytics.
And we want to show up -to-date information to our customers about what our systems are doing with their traffic.
And this particular blog post talks about how we built it.
And actually, it's rather interesting because we use Clickhouse extensively for storage.
We use GraphQL as the query language or query mechanism to get the data out of it.
But I think what's interesting in this story mainly is that as our systems got more and more complicated and had more and more features, we needed to rethink how we keep track of what those features are doing.
And this blog post talks about the trade-offs in doing that and how we re -architected things.
Essentially using NetFlows, it's the S flow. It's a similar sort of mechanism that people use for traditional networking, works well with our environment.
And then we then feed data into Clickhouse and make it available in the dashboard.
So this is a rewrite. Again, if you think about this blog post and the previous blog post, which was about rewriting, perhaps how we do some proxying mechanisms, one of the key things to make Cloudflare successful has been to recognize when we need to write a new thing and not stay with the past architecture.
And so over time, everything has gradually got rewritten. And actually, two of the things I worked on really very early on at Cloudflare, so I think with Railgun, which is a compression technology, that piece of technology is getting deprecated soon and replaced because we've got better stuff for it.
And also the WAF, the WAF has now, a whole team has taken over.
And my original WAF is slowly getting deprecated too.
And it's been written, something more powerful is being created.
So it's all part of what we do is write something new when there's something better.
And coming up, not to sneak preview too much, but we've got developer week coming up.
And one of the interesting blog posts in developer week is about rewriting, or at least replacing one of the fundamental pieces of technology in Cloudflare, which is called Nginx FL, which is one of the last uses of Nginx at Cloudflare.
And we're replacing that with something based on workers. So we're eating our own dog food, drinking our own champagne, replacing a fundamental component.
So you'll be able to read all about that. But if I had a piece of advice to give anyone who's interested in building something like Cloudflare scale is don't be afraid of rewriting things.
That's interesting. Here actually is the Railgun in the real world, first blog post, I think, 2012.
It's the year where we wrote about this and it was important at that time, right?
Really important. But it was important because what we wanted to do was reduce the time taken between us and an origin server.
And this was one way of doing it, which was to use Delta compression technique, which I worked on to do that.
Now, today we have all sorts of other techniques out there, right?
And one thing is our network is a lot bigger. We have Argo for routing things around.
We have all sorts of load balancing technologies.
And so this is now getting slowly replaced by other technologies in the Cloudflare suite.
But yes, this was way back when we looked at it. Way back. More than 10 years.
Yeah. 11 years ago. Yeah. So in Internet terms or technology terms, let's say like that, it's a long time.
It's a long time ago. Yeah. Yeah. Just for reference, here is the Cloudflare's new network analytics dashboard blog post.
So this is the new dashboard we announced. This is all powered by...
It's one of those things where it's like, you know, being able to pull up a graph that you can drill down on or something is very, very powerful.
And behind it, you have to have a lot of machinery to get all of the data, to store it, to make it available, scalable, all that kind of stuff.
So it's connected to do this when we were mentioning how we build it.
Should we move on to another topic?
We talk about SLP, a new DDoS amplification factor. It's funny how these things come around.
So DDoS amplification, such a common technique, you send some traffic to a machine that will then respond with more traffic.
So that's why it's called amplification.
Typically what you do this with is something that's UDP -based because with UDP, you don't have to have a connection and you can spoof the victim's IP address.
So what you do is you send pretending to be the victim. So you fake the IP address of the person you want to cause a problem to.
You send a packet to something and that something will respond to who it thinks is the legitimate person who is your victim.
And if there's an amplification, often with more traffic. And so this SLP, yet another protocol, such an old protocol, 1997.
But it's recently been used, has been pointed out as another amplification vector.
So yet another thing, just honestly, just yet another reminder why DDoS services are so important, because there are so many ways to attack people.
You were mentioning this the other day, how easy it is to use DDoS for attackers.
So this is one of those examples in a sense.
And it was one Portuguese researcher actually that discovered it, which is also for me interesting.
But Cloudflare customers are protected.
So this is also relevant to just put it out there.
Yeah. I mean, if you're using Cloudflare's protection, which is built in on the DDoS protection, then you've got the stuff already protected against.
But if you're not, then here's blocking port 427 is one way to do it.
And so you should be able to block it pretty easily.
Where should we go next? Oh, goodness. The way should we say, how about we talk about Grant, who's just joined as chief security officer?
And in this case, you were with that team for a while. Yes, I was leading that team for about six months.
And luckily, Grant has joined us in April as chief security officer.
So Grant's got a big background in ticketing banking, HSBC, Silicon Valley Bank, but also places like McAfee.
And he's going to be leading innovation and in particular, how to keep Cloudflare itself safe.
So it's great to have a new executive on board and expect great things from that group.
So welcome, Grant.
Exactly. Welcome. And those who are interested can learn a little bit more about him in this blog.
About his background and why he decided to come join us.
Absolutely. More things. We didn't discuss this one about secure by default. Recommendations from Decisive's newest guide.
How Cloudflare follows these principles to keep you secure.
I think what's interesting about this idea is that, so this is CSIR in the US.
It's really talking about things that should be secure by default.
For years and years and years, we've had hacks where IoT device, for example, has a default password or has all the ports open and things like that.
And what this is encouraging is in the US, and of course, I expect this will have some impact worldwide, is that manufacturers have to think about security when they build a device and they have to, by default, make it secure.
And this is a guide about how to do this.
I think this is very, very sensible. If you think about the number of things we plug in at home, connect to our Wi-Fi, or the number of services we use, or the number of devices we have, making it that security is part of the process is really important.
And making it by default. It shouldn't be trivial to guess that, well, this thing's got a default password.
Therefore, you have to change it to be safe.
It should be, well, no, actually, the password has been set for you to something random, for example, and much more.
So yeah, that's what this is all about.
This is an initiative by CSIR to really encourage people to be secure by default.
And I think it's very, very important. We all need the Internet to be secure.
And as easy to use by default as possible, for sure.
Absolutely. We have a Privacy Awareness Week in Australia. Very good. I think that my view is that individual consumer privacy is one of the big themes of the Internet now and over the next few years.
I mean, you see countries around the world legislating protections for their citizens and countries educating their citizens on their data being private.
And this just talks about there is this Privacy Week in Australia, and hopefully people will be aware of what it means to be private on the Internet.
And so we talk a little bit about some of the things we've done.
In particular, for example, we were just talking about at the beginning of this about the Star Wars DNS stuff.
Now, we have the Republic Resolver 1.1.1.1, which is where you got that data from.
What we don't know is who made those queries, for example.
We can only see the aggregate trend. And the reason that happens is that to remove the identifying information, we actually anonymize the information at the edge on our servers.
So it never actually gets stored. So you can't actually see that information.
So I think this is a great example of protecting someone's privacy.
There's no like, oh, actually, we can secretly get it from this other database over here.
We just don't have it. By doing that minimization and stuff like that is really important to so that services can be trustworthy.
Obviously, we talk about other services like DNS over HTTPS, which protects DNS queries so they can't be spied on, say by an ISP or in Coffee Shop.
iCloud Private Rerate, which we talked about in the context of Oxy, which actually means that ourselves or a service provider and Apple can't figure out who's going to which website, for example.
So one party can the website, the other party can know the user, but they can't link it up together.
That's actually really, really important.
And that's actually part of a thing called Oblivious HTTP, which is related.
And the really interesting example is this thing, Flow Health. Flow Health is an app for women's health, in particular, for period tracking.
And they created an anonymous mode.
So this is a service, right? It's an app and it's a service online, where if you put it in anonymous mode by using Oblivious HTTP, they can't make a link between who's entering that information and the information that's being stored.
So that allows someone to put private health information in a system without there being any way to match it up.
And those are really relevant, I think, because those are the information you really want to be with you, not others, not other actors.
So privacy in some of these more sensible areas is really important.
There's lots in here about, and in particular, about what Australia is doing to educate people, and it's well worth a read, especially if you're in Australia.
Exactly. And even in terms of general world, it has good examples, which are always interesting.
And also how we do privacy at Kalfa in terms of mindset and all that.
You were discussing before the quad one, 1.1.1 app that we have as a resolver.
I used to say, usually I say that our data is dumb.
Why are we seeing this? We don't know. Our data is a little bit dumb.
So it's dumb because it's private, because we don't have a good insight.
We just see DNS queries going up or down. Right. Yep. Exactly. We don't know who made those queries.
More things. Where should we go? Speed? We can go to speed.
Yes. Yeah. Kalfa is faster than Xscope and Zscaler across latem. So this is Latin American regions.
I mean, we've written about this kind of stuff before about how we're faster than Xscope and Zscaler.
We did this update around these things and we've done other updates.
This just happened to be a deep dive across Latin America.
And the reason for doing it is it's very important that people realize the scale of Kalfa's network and that we're all over the world.
And by zooming in on a region, so if you look, we have 47 data centers across Latin America and the Caribbean.
And so that's like really that no one else is in the same position as us as being all over the region.
And that is true all over the world. That allows us to have really great performance for our products.
Traditionally has been the case for the web and websites and APIs and stuff, but it's also true for Zero Trust.
And actually a great example of this is when we launched 1.1.1.1, we rolled it out across all of our data centers around the world.
And it instantaneously was the highest performance public DNS resolver out there.
And as much as the engineers did a really great job of engineering that thing to be fast, a large part of it was the scale of our network.
And you see that here with Zero Trust. I mean, it's just stunning how much faster we are than our service providers because we have the scale.
And so no matter where you have your business, if you're using our zero trust solutions, we're going to be fast.
Exactly. And it's all about technology, I think, because products are sometimes difficult to do, to be efficient and all that.
But when you build for a number of years, a good basis technology wise, in this case, data centers presence in everywhere in the world, in a sense, what you did before will have an impact to a new product.
It's something where you're shipping.
It's new, but in a sense, it uses a basis that is not new and it's very evolved and important.
So that will make definitely a difference. I mean, I think the truth is, it's part of this technology and part of it is just simply scale.
We have what, 295 cities where we have stuff. And so you pick a country here and we can show you what the performance looks like.
So if you're in Brazil or in Chile and Argentina or Paraguay or Uruguay, I mean, just you can go through it here.
And I think that makes a huge difference. And some of these differences are really big.
So it makes a difference for a user in terms of that speed and performance.
Well, the Blocos has actually examples for all of those countries, which is if you're living in one of these countries, you can check how we compare in your country in terms of speed here.
Yeah. And of course, I'm sure the same story is true across the world.
This particular one focuses on Latin America, but if you want to look to Europe or you want to look to Africa, or you want to look to Asia, you would see very, very good performance from Cloudflare because of the scale of our network.
Exactly. We have another Blocos that mentioned before that does that more generally speaking.
Where should we go next? Let's see what else there is on the week of blogs.
So we haven't talked about Sudan. So you wrote this one.
So you can perhaps talk about this one more than I can. But obviously, there's an armed conflict going on in Sudan between rival bits of the military government.
And perhaps unsurprisingly, this has an effect, right? We're seeing Internet outages, some of them probably power related, some of them probably actual shutdowns ordered by people.
Take us through it. Sure. In a sense, it all started on April the 15th in terms of armed conflict.
And on that day, exactly, we saw Internet traffic going down.
And first, it was related to one specific ISP, in this case, ASN, Autonomous System.
We saw a clear outage in that ASN initially, on that day, April 15 and 16.
But the Internet traffic in the country went down in general.
Exactly. And we saw this actually in Ukraine, when there's a big disruption in this case, an armed conflict, and we saw on the news that people were fleeing.
So this impacts people on the country. So that also impacts Internet traffic.
So that disruption sometimes is just because people are preoccupied and trying to leave the country or something like that.
So sometimes Internet traffic drops because of that, other times because there was a blockage.
And we saw this chart we're seeing here shows the April 16 clear outage in MTN, one of the main ISPs in the country.
But also, especially on April the 23rd and 24th, we saw a major outage in several ISPs, especially Sudetel, but also others.
The country was almost for two days, mostly without the Internet.
Not a complete out, but a big difference.
We have a chart that shows here, traffic dropped more than 74% in one of those days.
That's a lot in a country. So we have a few examples here of that drop in traffic, and also how mobile device traffic percentage in Sudan.
And Sudan is one of the countries that has the highest percentage of mobile devices.
They use more mobile devices there. It grew after the conflicts started.
It grew a lot. Yeah, I thought that was interesting. I hadn't realized that Sudan was up there.
I knew India was often in the charts, but I thought this was interesting.
Your chart here, so the Zambia, very high use of mobile Internet access, Guinea-Bissau, Mauritania and Cuba, and Niger as well.
But Sudan, right there at the top, 88% of Internet access is by mobile devices.
That's true.
And it adds up, for example, in terms of research papers. I remember as a journalist reading some of those in terms of Africa, mobile devices are the main way people use the Internet in Africa.
And this, in a sense, also shows that.
It shows it. Yeah, it shows it. And obviously, it's been brought up there. Yeah.
And it, in this case, grew after the conflict started. Right. That's the perspective there.
So a few trends, just that, in terms of Internet traffic and changes.
And traffic continues to be lower than usual. We also saw this interesting pattern using our resolver.
We saw that people start moving from WhatsApp as an app they use, to Signal.
So Signal domains had a bigger increase than WhatsApp dropped.
It seemed like a difference people started using. I wonder if that's a social reason for that, or if WhatsApp was blocked or throttled or something.
It'd be interesting to know the answer to that.
True. Again, we don't have a view on that, but the trend is there.
So it's interesting. Signal is known for their cryptography and all that.
But WhatsApp too, right? Because they adopted similar mechanisms.
True. But I think there's a difference in terms of public perception. There may be a difference in terms of public perception.
Yeah. Well, just a few trends related to Sudan.
And mostly, I think we covered all of the blog posts, in this case, we did in the last week or so.
So I think that's it. There we go. Great. Well, that's good to see you again.
And good to do this week in net. I guess we'll do it next week.
And we'll do it next week. And this weekend is the King Charles III the coronation in the UK.
Very true. So maybe we'll have some trends related to that next week.
Yeah, it's tomorrow, right? It's on Saturday, I believe. It's on Saturday.
Yeah. Although there's events all through the weekend, I think. Right. Right.
You know more about it than I do. I probably should brush up on the details. But yes, it'll be interesting to see what the trends are there and how that shows up in the Internet use in the UK and beyond.
And beyond, yeah. We saw that actually last year on the Queen's Jubileum, an impact in different countries, not only in the UK.
So let's see it. Also, the death of Queen Elizabeth, that was a major event on the Internet too.
So yeah. It was. It was. We'll probably put some trends related to that too.
There. Great. It was great. Thank you, John. See you next week. Yeah.
See you. Bye. That's a wrap. Before we go, it's time for our AroundNet short segment.
This week, we're going to travel to three different states in the US.
Yesterday, May the 4th, was our New York City Connect 2023 event.
It was full of keynotes and panels. And by the way, our London Connect event will be on Wednesday, May the 24th, and you can attend there or online.
Last week, we participated in the RCA conference in San Francisco.
Derek Shimoro from our hardware keys team was there.
For those who don't know, hardware keys are these little things that make much more secure online.
Let's hear from Derek. He's talking from where he lives, Austin, Texas, during a hiking.
Hi, my name is Derek Shimoro, and I'm coming in today from Austin, Texas.
And this week, I presented at the RSA security conference on how Cloudflare is changing the way we distribute keys securely throughout the Internet.
I love working at Cloudflare because they encourage all the engineers and all the engineers to find modern solutions to problems that exist today, as well as finding solutions for problems that people don't even know about.
In my spare time, I love cooking.
I love spending time with my family, especially my five-year-old. And I'm an avid trail and ultra marathoner, as you can see behind me.