This Week in Net: Why DDoS attacks and BPF tails are important

Presented by: João Tomé, John Graham-Cumming

Originally aired on September 3, 2023 @ 11:30 PM - 12:00 AM EDT

Welcome to our weekly review of stories from our blog and elsewhere, from products, tools and announcements to disruptions on the Internet.

In this week’s program, João Tomé is joined by our CTO, John Graham-Cumming, with two topics at hand. We start with about a programming-deep dive we did in our blog where we unveil a few secrets behind BPF tail calls on x86 and ARM.

Then, we discuss the importance of the “old” DDoS attacks that continue to grow and to create problems for companies (we published Cloudflare’s DDoS threat report 2022 Q3 this week).

Read the blog posts:

English

News

Transcript (Beta)

Hello, and welcome to This Week in Net, our weekly review of stories we've been writing in our Cloudflare blog, but also things affecting the Internet. With me, I have, as usual, our CTO, John Graham -Cumming, and we're both in Lisbon, although we're connecting using the Internet. Hello, John. Hello. Hello. This week we didn't have a lot of blog posts, but we had two really big ones in different types of ways, but big ones. One is the DDoS report regarding Q3 2022. The other is a deep dive related to BPF tail calls on x86 and ARM. Where do you want to start? Let's start with the tail calls one, because I actually think this is a really, really fascinating topic. Do you want to bring it up on screen? I'm going to do it. Okay, here it is. Assembly within BPF tail calls on x86 and ARM. Actually, last week you spoke about this blog post without giving any details, just that it was coming, a very good deep dive, and here it is. We've just recently done GA week and we've done birthday week, and those tend to be mostly focused around product announcements. Although there were things like the open sourcing of the worker's runtime during birthday week, which of course is fantastic and is very technical, but one of the things I think that characterizes the Cloudflare blog is very, very deep technical content, and it doesn't get much deeper than this one. Jakob, who actually works for me out in Warsaw, is one of a small group of people in the company who do work with BPF. We have written a lot about BPF, which is the Berkeley packet filter, over the years. If you look on the Cloudflare blog, you will find it. We use it a lot. It is a simple language, if you like, machine inside, well, various things, but in our case, we're very interested in it from a networking perspective, because we use it for manipulating how the Linux kernel treats the network, and we use it for DDoS mitigation and all sorts of other stuff. One of the things that Jakob was super interested in was, because we use it so extensively, is bringing some bits of classic computer science to BPF, or showing how it can be done, and in this case, tail call or optimization. The short thing about tail call optimization is that typically, when you have a function inside a program that calls another function, or in particular, itself, it's recursive, so it's calling itself, it uses a mechanism called the stack, and the stack is literally a pile of places to go back to. If you imagine a function calls itself, then a thing gets put on the stack saying, okay, when you finish doing this, you need to go back here, and if it then calls itself again, the stack keeps getting higher and higher and higher, and we call this the stack depth, because actually, typically, stacks go downward for historical reasons, and how deep the stack is, is a problem in computers. You have to not run out of stack, otherwise, you can't ... It limits how much recursion you can do. One of the techniques is this thing called tail call optimization, and if you just go back a little bit, you can see a couple of examples of the Fibonacci example at the beginning. You'll note, what it shows is that, in the case of, if the compiler- Right at the beginning. Right. You see this thing here, classic, the Fibonacci sequence, which is, it's basically, it's the sequence that starts 1, 1, 2, 3, 5, 8, 13, 21, et cetera, and it's basically, the next number in the sequence is the previous two added together. The classic way of doing that, which is this first thing here, fib okay, is you, if it's one of the first two numbers, you return one, otherwise, you return the sum of the previous two numbers recursively, and that can end up with very, very large use of stack, and if you scroll down slightly here, you can see there's another alternative where, right at the end, this fib cool here, the next one, this allows a tail call optimization to happen, so if you scroll just a little bit further, there you go, and you can see what it does is it calls this thing fib tail, which knows what to do and it's a single call at the end, and what happens is the compiler says, well, I know that I'm going to come back to this place and there's nothing else I need to remember, so I actually don't need to put it on the stack because I know I'm going to keep going always to the same place, and so essentially what happens is it just keeps, it loops around and it uses a jump, in fact, and this is really cool because it saves stack, and if you have a limited amount of memory, you can do things that you couldn't previously, and so BPF didn't allow you to do this, and so Jakob's blog post is about how it is actually implemented, both on x86, because Powerful uses Intel machines and uses AMD, and also ARM64, because we have Ampere machines in production, so we actually keep our software stack alive for both architectures, and so the reason I titled this one Assembly Within is right here is a great example, there is some x86 and ARM, both 64-bit code, showing how the Fibonacci function actually gets implemented, and in fact, if you were into assembly language, the thing you would notice in here is that this uses jumps rather than uses call, call is the thing that typically puts something on the stack, and so this is the tail call optimization done by a compiler, wonderful, now if you keep scrolling down, you will discover that, okay, we can look into it, and he's going to talk about how it works and explains the stack growth and all these kinds of things, and he's showing how it goes, explains how the jump is created rather than using that, and goes in even further using Gidro, which is a really cool piece, we could bring you basically pseudocode out of something, and we want tail calls in BPF, we want this optimization to happen, because we can't have long, deep depth of stack happening, and we use this extensively, we use it with XDP, XDP is the express data path, and it is used as part of our DDoS mitigation, and there's a lovely talk about this and how we do it, because it is extremely fast, and we want tail calls within there, and so Jakob shows you exactly how BPF tail calls work within here, and if you are into learning something about really low level stuff, particularly about how Cloudflare does DDoS mitigation, but also just general use of BPF, this is really, really detailed, it's also definitely not for the faint of heart, in the sense that it does assume that you have some familiarity with assembly language, and a little bit of familiarity with the idea of a tail call, although I think Jakob does a pretty good job of explaining it, but this for me is a classic deep dive Cloudflare blog post, it's got assembly language, it's got C code, it's got output of all sorts of stuff, and I think it's a pretty interesting part of the low level stuff that Cloudflare does, that we make up that is super efficient, and stuff like this is one of the reasons why we offer unmetered DDoS mitigation, because we're able to do it really cheaply and efficiently. And I think it works in two ways, like you said, first we explain how we build stuff, in this case, it's all about DDoS mitigation, this is important for us, and we're putting it out there, and we're also helping others to get into this way of doing things, so that's helpful, even saving time, saving problems, saving time, resolving problems, so in a sense, there's an academic type of thing here, where you're showing what you do, how you do it for others to help themselves, but also in a sense, also help us. And maybe if you're into this stuff, and you're looking for a job, maybe you think about us as a place to come work. I think the other thing is, the other blog post this week is also about DDoS, but at a totally different direction, so if we take a look at that one, it's all about what did we see in Q3 from a DDoS perspective. Here it is, the DDoS report, it's already a usual blog post we have, so every quarter we do the DDoS report, this case is Omer with the Radar team. Exactly, well, Omer, who works on DDoS, tends to, he's a product manager, and he writes a report about what's happened in the sort of threat landscape from a DDoS perspective. I mean, the thing that's really striking to me about DDoS is that it's just so commonplace, and in particular, I think, you know, as he points out here, you know, we mitigated a 2.5 terabit per second DDoS attack using the Mirai botnet, which has been around forever, and, you know, and aimed at Minecraft. Minecraft actually is a surprisingly popular attack victim of DDoS attacks, and people love to go after people's Minecrafts. I think we have older blog posts with that, actually, which is interesting. It's really crazy, and, you know, we just see these things happening all the time, and so, I mean, if we go scroll down, we see some of the overall trends in DDoS, you know, well, there's been a very large increase, I think it's 111% at the HTTP level, you know, 97% on the network level, I mean, this is just an incredible growth of this kind of stuff. Mirai is very important, you know, attacks in Asia, particularly Taiwan and Japan, and, you know, what I think is, what is, you know, striking is, you know, as we're recording this today, there's a, there's somebody on Hacker News who has launched an application called Linear, and there's a discussion about it, and they had been fighting off a DDoS attack. Last week, I was talking to a customer who was fighting off a DDoS attack, and what's striking is that pretty much anything can get DDoSed. It's so easy for people to do that, you know, you'd be surprised who gets attacked. So, sure, sometimes it's for political reasons, sometimes it's, you know, somebody got angry at you for some reason, sometimes it's just for fun, and sometimes it's for business reasons, right, we'd like this service not to work. And I think the scourge of DDoS attacks is really a big problem, and, you know, one of the reasons why we give away DDoS mitigation is because you shouldn't be DDoSed offline. And this, you know, you can go through here, there's a couple of other things, if you scroll down, you know, you sort of see trends about what happens. Now, the report makes a distinction between the network layer, so layer three, layer four, so in TCP, TCP perspective, like the IP layer and the TCP, UDP layer, and also application layer, which is, you know, going after HTTP level or similar high-level protocols. You know, there are two different attack types, and often we see what we call multi-vector attacks, which is people will use, especially the large ones, they'll throw everything at a customer, so they will go after, give this example here, which is like UDP and TCP floods, or people will do network layer and application layer at the same time. So we see these kind of attacks happening all the time. And then the exact protocols being, you know, being used depend a little bit. If you scroll down, there's a little discussion about BitTorrent. Here it is. Oh, yeah, so that's a big increase. That's a big increase, more than 1000%. So, you know, BitTorrent, one of the things that happens with attacks is that people find, you know, that what they'll find is that the victim has filtered out, you know, attacks against DNS, for example, or attacks against NTP, or something like that. So they'll look for some other protocol they can use that isn't filtered. And it seems like this quarter, BitTorrent has become the favorite. Somebody's probably got themselves a little botnet they can do that with. And this is the one that uses spoofing. So, you know, anything that is, where there is not a connection made, allows you to do spoofing. And spoofing basically means I cause something else to pretend to be the victim. So, and, you know, there are a couple of ways you can do it. One is you can say, I'm the victim, send a message to someone else who then replies to the victim and overwhelms them, that hides who you are. Or the other way is you spoof the source IP address. So you just say, hey, this is coming from over here. Sometimes you see that with Cloudflare. Actually, sometimes we get people writing and say, why is Cloudflare doing a DDoS attack against me? And it's not. It's that someone has spoofed our IP addresses and is sending packets in our name, basically. And causing somebody a problem. And then the next one here is Mirai. Mirai botnet has been around for a long time. It's a thing that goes after smart devices and uses default usernames and passwords. If they've got default usernames and passwords, then they get, you know, log into the device, change the firmware, use it as part of a botnet. And this could be Internet of Things, right? So your little device, your camera, connected to the Internet in your home, could be doing attacks for other people, for attackers. We've absolutely seen this. I mean, Mirai was particularly against cameras and DVRs that were left on the Internet. But yes, anything that's connected to the Internet and is accessible, especially if it's got a default password. In the original Mirai, if I remember well, you could telnet into it, using an ancient protocol with a default username and password. So, you know, this is the problem. Change those passwords. But people don't think about their smart toaster being part of a botnet. But that can happen. And the interesting thing there, for me, learning all about these things, is an attacker could be in Portugal, where we are right now, and just using botnets that are in the U.S., so cameras and all the things that are in the U.S. So the attack seems to be coming from the U.S. or other countries, and it's not. It's an attacker in another country that controls devices. Yes, and that's often one of the things that kind of skews information about where do attacks come from. Because if you think about, you know, if a botnet is all over the world because it's, you know, it's broken into computers or DVRs or cameras or something like that, then if it was truly randomly over the world, then it would tend to come from countries with a large population. So you would tend to get lots of attacks from China, Brazil, the U .S., right? So it tends to look, oh, look, you know, Brazil is attacking me. But actually, you know, it's not Brazil that's attacking you. It's the router from some ISP in Brazil that was easily hacked or something like that. So it can be a little bit difficult to, you know, attribute. Really spot, yeah. Yeah, it can be like who actually did that. It probably wasn't Brazil. It was probably someone else, right? Actually, I've been working on that, and usually it's very interesting that the U.S. usually is the number one because it has more zones, but also the local country. The local country sometimes is the number one country. So the country that is being attacked, like a zone in one country, the attack is coming from that country in terms of percentages, or those attacks are coming from that country, or the U.S. is usually the way to go, although sometimes it changes. Yeah, you know, it's one of those funny things. The other thing, like in the original Mirai attacks, I remember there was like a sort of a skew of traffic scenes that come from certain countries. If I remember well, Hong Kong was one of the countries, and that was because the devices that had been hacked were sold more in those countries than elsewhere in the world. So what happened was they got hacked, and it's like, wow, we're getting all these attacks from Hong Kong. It's like, yes, because a bunch of people in Hong Kong bought, you know, XYZ camera or whatever and installed it. So, you know, looking at the pattern of this stuff is fairly complicated. One of the things we actually want to do, and we're working on, is if a packet arrives at one of our data centers with a particular IP address when it's spoofed, or maybe not spoofed, we want to look at that IP and go, should this packet have arrived here? Because like if I say, oh, you know, I spoofed the IP address of, you know, let's suppose, you know, a big company in Portugal, and it arrives at a data center in, you know, in, let's say, San Francisco, then something's wrong with that, because it should have gone to the Lisbon data center at Cloudflare. But we can actually use that kind of map of the Internet to figure out, wait a minute, this is definitely, you know, nonsense, and then delete it. We didn't spoof, but ransom DDoS attacks are also a thing for at least one year it's been increasing. There's also a mention here to those. So this is like connecting ransomware, ransom attacks to DDoS. So it's increasing also. Yes, I mean, this is, you know, criminals like making money in criminal ways. Ransomware is a way to do it, right? And it's interesting, because you were saying DDoS is an old method, but it's still pretty much not only in use, but because it's easy, it's one of the biggest ways people, attackers use. And sometimes they don't use only DDoS, but DDoS usually is there because how easy it is. Sometimes they're actually trying to enter in one side, but they're using DDoS to confuse a company in the other. Yes, sometimes that happens. I think that, you know, as you say, DDoS is old and pretty easy to do. And it's vandalism, right? I mean, in the real world, people vandalize things and this is the same kind of behavior. So it's easy to do. The ransomware stuff is a little bit more interesting because there's a direct monetary thing tied to it, right? Whether it's like the crypto locker type thing where everybody's machines get encrypted and then there's a demand for a ransom to decrypt them, or it's the DDoS style, which is, you know, what would typically happen is either a note is sent in saying, hey, tomorrow your website is going to go down because we're going to DDoS you, or the website goes down and then the note arrives and saying, hey, we'll do that again, right? But if you think about it, this is no different than in the real world, kind of a protection racket, right? You know, it's like, well, if you don't want your shop to catch fire tomorrow, you'll pay us some money, right? It's the same stuff, but online. Yeah, you're making a good point here on Radar, right? On Cloudflare Radar, which is radar.Cloudflare.com, which is a really, really great site where we have a ton of information about real-time information about what's happening on the web. We have a section. Here's Radar, this is like real-time stuff that's happening, and you can zoom in there on your own network or a particular domain name or particular country or whatever. You can see stuff. One of the things we have is the reports, and so the blog about the DDoS stuff that Omer did is backed up by a very, very detailed report, and you can grab those charts if you need to use them internally in a company or a presentation or something. You can get the data and use it yourself. So, I definitely urge people to look at Radar.Cloudflare.com because it's pretty interesting in terms of what we see, and because of the scope of our network, because we're sort of everywhere, we get a really good sense of what's happening in terms of the Internet, weather globally, and all the things that are happening. Exactly, and it's interactive, so you can see a specific country in terms of the percentage of traffic we're seeing that was mitigated. So, it was, in this case, DDoS attacks that we blocked, that we mitigated. I think it's kind of amazing, this day and age, we're seeing on the news, even in Portugal with the airline company, and it's on the news every time in most countries, cyberattacks of all the orders in the US recently with airports, so cyberattacks involving DDoS. Usually, DDoS is always there, but what I found interesting is if you have good protection, you won't notice, sometimes, a big DDoS attack. So, the difference is you don't even notice that you were protected to, hey, my site is down, I have problems with my data. This is a very good point, actually, which is that a couple of weeks ago, I got involved in what we call an under-attack onboarding, which is somebody who's not a customer, is suffering from a DDoS attack or another type of attack, and they want help right now. This was a service that was offline. It had been on and offline for about three days, and that company used Amazon for their hosting, that's where the website was, and they had been fighting with Amazon's WAF and Amazon's tools over those three days, trying to add rules and trying to figure out how to do the DDoS mitigation, and the attacker, every time a rule was added, they modified their attack, and finally, they came to us and said, we're going to put you in front of Amazon, and we'd like you to protect Cloudflare, and it took about 90 minutes to switch them over from what they were onto us, and within about the first hour after they came onto us, there was an attack, a very large attack against them. They had no idea, and in fact, the only reason they had any idea was that they could look in the Cloudflare dashboard and see, oh, by the way, here's all this traffic we mitigated, and you can set up alerts for this kind of stuff, so you get told, but they just sat there, and I actually saw somebody else, the person who was on Hacker News, saying somewhere today that they were basically going to write a love letter to Cloudflare for having been able to deal with this, so I think this is an area where we have real expertise, and if you use some other service like Amazon for your EC2 or whatever, it's great, but we're definitely the experts in this particular area, so it was a bit of a DDoS week this week in net, right? It was, and it reminds me, actually, we had a blog post where we explained the eighth anniversary of Project Galileo a while ago in July, and we explained very well there how we, for example, onboarded Ukraine zones, Ukraine websites from government to news organizations, so while they were being attacked, and I think we also learned how to do better that way, because those were really big attacks at the start of the war in late February, so we explained that well in that blog post, but things are really happening in this area, so they're still growing. Warnings are being made even by government, so it's a growing area for sure. It's just a constant problem, yeah. So this has been great. Thank you, John, for your insights here, and let's talk again next week if you have the time. Yeah, good to see you again. See you as well. Bye-bye. So before we wrap up this segment, let me show you a little conversation I had with David Belson, our Head of Data Insights, about one of the things that is impacting the Internet on this week, the war in Ukraine, in this case air missiles that brought the infrastructure, energy infrastructure, and also the Internet down in multiple regions in Ukraine, but also the protests in Iran continue, and of course the Internet disruption also continues this week. So here is a little taste of my conversation with David Belson in a segment, a new segment we have now called Cloudflare Radar Bulletin, so you can find that out also, but here is a little taste of that. So with energy impact, usually also comes Internet impact. In this case, on Monday, we saw at that time, let me share here the chart, we saw a clear decrease at that time, 35% drop in traffic after 7.30 UTC time in Ukraine. In terms of the regions, Kharkiv, here it is. That's obviously a much more severe sort of immediate disruption. Sometimes we see them where the traffic tails off, sometimes we see a clear immediate drop. This was a clear multiple city and regions impact in this case, also a big impact in Lviv. Kyiv also was impacted in a sense, so several AESNs were also impacted. We continue to monitor the trends on that day, on Monday, but we did the same on Tuesday and yesterday. For example, on Tuesday, the impact was still clear in terms of Ukraine, actually not in Ukraine in general, but some of cities. This is the perspective here. So some of the cities were having that impact. In this case, Lviv was showing also a clear impact here. And also Poltava Oblast. In some regions, it recovered. Yeah. Recovery times often are related to dinner line costs. So oftentimes the power outages, we'll see recovery happen a little bit faster because obviously power is used for everything else. So the crews there will work to get the power restored as much as possible, whereas more significant infrastructure damage will often take longer to repair, meaning we'll see a longer road for recovery. And this morning, we just checked and it seems most of the cities are coming really back to normal, even those still impacted yesterday. So the recovery seems to be holding up, which is important in terms of having access to all sorts of things in terms of the So yesterday, there was also impact in Iran, right? Again, to Iran. So this is happening for a while now. Right. So since I believe September 21st or 22nd, somewhere around there, there have been regular Internet shutdowns, primarily focused on or primarily at three key mobile providers, MCI, Around Cell, and Rytel. So those are the three. And for about two weeks after they started, there was basically an Internet curfew, if you will, that was implemented, where the Internet connectivity on those networks was shut down effectively between like 4pm and midnight local time. And that happened for about two weeks. And then I think early last week, that sort of regular shutdown stopped. They went for a few days. And I think over the last week, they've implemented shutdowns again, sort of more unannounced or unexpected shutdowns twice, I think once over the weekend, and then once yesterday. And then again, these are obviously all related to the riots and protests and whatnot that are spreading across Iran, related to the death of a young woman there. What a lot of these providers are doing, or ultimately, I think driven by the government, is shutting down the connectivity at key times of the day, in large part to prevent information about what's going on there from getting out. There's two things here, I think that are important, like the curfews, Internet curfews. So that happens in specific times of the day. We've seen this before in other countries, in other situations, even in Kazakhstan a while ago and earlier this year.

This Week in NET

Tune in for weekly updates on the latest news at Cloudflare and across the Internet. Check back regularly for updates. Also available as an audio podcast!

Watch more episodes