SECURITY SPOTLIGHT - State of 2020 DDoS Threat Landscape
Presented by: Vivek Ganti, Omer Yoachimik
Originally aired on September 26, 2021 @ 4:00 AM - 4:30 AM EDT
Learn about the evolving DDoS threat landscape from Cloudflare’s vantage point, as the product team shares 2020 DDoS trends and observations.
English
Transcript (Beta)
Hello everyone, my name is Vivek and I'm with Product Marketing here at Cloudflare. Thank you for joining us today on the Cloudflare TV segment.
With me today I have Omer Yoachimik from the product team.
Omer, would you like to introduce yourself?
Yes, thank you Vivek. First of all, thank you for inviting me to the show.
So I'm Omer Yoachimik, the product manager for Cloudflare's DDoS protection service based here in London.
Great, thanks Omer. So why don't we begin with your journey so far?
I know you built our badass DDoS mitigation solution here at Cloudflare, but where were you before this?
What's your journey been? What's your story?
Okay, well I'm originally from Israel and like most Israelis I too served in the military after high school.
I was a lieutenant in the Israeli military intelligence.
I served for a little over than five years and after my service I started studying computer science and simultaneously working in some startups.
One of them was acquired by Microsoft and I worked at Checkpoint, Radware, Imperva and some other smaller startups.
And then finally last year I moved to London to join the Cloudflare team.
Wow, you have a breadth of experience. You come to breadth of experience with you.
That's great. So let's let's begin by talking about what is a DDoS attack and what does the DDoS team at Cloudflare do?
Okay, sure thing. So first of all, I would say that the classic definition of a DDoS attack, DDoS is an abbreviation for distributed denial of service.
And the classic definition is a malicious actor that intentionally, intentionally sends large amounts of traffic to your website or to your Internet property with the intent to cause an outage or service disruption.
Now that's the classic definition. And we can expand on that to a more modern one, which includes any unwanted traffic that has the potential to take your websites down.
This includes overly excited good bots, and even buggy client applications.
Because from the user's experience, it doesn't matter whether your website is down because of a DDoS attack, or because you mistakenly introduced a bug into your mobile app, which then bombards your service, downtime is downtime, revenue is impacted, and users start tweeting.
And so the DDoS team is entrusted with protecting Cloudflare's infrastructure against attacks and all kinds of unwanted traffic.
And also, of course, our customers across layer three to layer seven of the OSI model.
Unmetered DDoS protection is included as part of every Cloudflare service.
So the DDoS protection is included in the WAF service, the web application firewall, workers, the DNS service, spectrum for layer four applications over TCP and UDP, and even Magic Transit, our newer service for protecting entire network ranges.
Yeah, thanks.
And by layer three to layer seven of the OSI model for some of our viewers who might not know, that's networking lingo for how different elements of the Internet talk to each other.
It's broken down by different layers of the OSI model. And Omar, you were just mentioning that we as in Cloudflare provides DDoS protection for assets, or the network layer, transport layer, and the application layer.
Yes, correct. Across that entire spectrum. Great, thank you. So let's talk about trends.
What we've seen, I mean, I'm taking this from home. You're doing this session also from home.
We've all transitioned in our lives in the last few months.
And because of COVID-19, the pandemic, we're all working from home, playing from home, and also trying to stay connected with each other through Zoom, just as we are now.
So we've seen a surge in Internet traffic. I think that's a known and understood phenomenon.
Is that also true for a surge in cyber threats? Yes, definitely.
We are seeing an increasing number of online attacks. More and more people are becoming reliant on online services.
So the potential for destruction and creating chaos by attackers is becoming much more significant.
The potential to cause chaos is larger. However, there's also a new generation of attacks that we're seeing, which is cyber vandalism.
The more amateur attackers, maybe even just bored students at home, that play around with online or free tools, just to experiment and see what kind of damage they can cause.
We're seeing a lot of those.
Online and free tools, just to see what damage you can cause.
That's interesting because the damage they cause is very real. If it's a cyber vandal who's attacking a website of a multinational company, then that causes some real brand damage and real revenue loss, as you just mentioned.
So this almost seems like an asymmetrical war, where attacks can be launched easily and inexpensively, but they do cost companies a lot of damage.
Yes, it really is. And companies that are not protected, don't have the right measures in place are often left having to deal with the consequences of damaged reputation, and even a direct impact on revenue.
Certain customers being down for even for certain organizations, being down for even a minute could mean millions of dollars.
And how long do these attacks last?
Yeah, that's a good question. So over 80% of the attacks that we see on our network last under an hour.
But there's also a good 5% of attacks that are very persistent, which last over 24 hours.
We've seen some sophisticated attacks, even when they detect that there's a mitigation service, a protection service, that's protecting that website, and that they're faced with DDoS mitigations, and they constantly attack, see that their attack is blocked, then they pivot, they change and adapt their attack vectors, trying to get through.
The attack vector, by the way, is another industry lingo for the attack method.
And so they try to change and adapt their methods trying to get through these protection layers that that are in place to try and cause damage.
But we often are, they don't know that we are protecting that property, and we stay try and stay ahead of them multiple steps ahead.
And we also have a large network that allows us to learn from the attacks.
And we deploy those learnings globally and instantly. So we can detect the type of an attack in one location, and our systems would automatically spread those mitigation rules in all of our edge data centers, for instance.
So what I'm hearing from you is that for companies to protect themselves, they would have to call Cloudflare.
Yes, call us. No, but in all seriousness, and I do tell this to all customers I speak to, you need to do a thorough review of the security measures in your organization.
Enterprises, both big and small today, need to think of security as a fundamental building block of their network.
So true. Okay, so we're seeing a larger number of smaller attacks in 2020, and small being small by Cloudflare standards, where we might define small as something less than 10 gigabits per second, which might still be enough to cripple any website or any network infrastructure.
Are we seeing large scale attacks too, larger than 10 gigabits per second?
Yes, definitely. The number of large attacks is increasing, and also their rate, how high they're reaching.
And why does this matter? The size of attack equals the strength of the attack.
A stronger attack, a larger attack, can cause more damage.
So it comes down to this, if the attack is larger than what your systems can handle, your systems being your routers, your servers, or even your Internet link, then it has the potential to cause an outage.
So the number of attacks that we've seen over 100 gigabits per second, for instance, has significantly increased.
I mean, more than 88% of these attacks in the first half of 2020, were launched after shelter in place went into effect this year.
And we also saw two of the largest attacks on our network ever, in June and July of this year.
And so even though we're seeing an increase in the smaller attacks, the cyber vandalism, we're also seeing attacks at scales and rates that we've never seen before on our network.
Oh, so let's dig into that a little bit. You mentioned we saw some of the largest attacks on Cloudflare's networking recent times.
Can you tell us a little more about the attacks?
Yes, of course. So recently, we saw two very large attacks.
One was packet intensive with a high packet rate. And one was bit intensive with a high bit rate.
These are two different methods of launching attacks.
And they try to, they try to cause an outage in different ways. So a bit intensive attack with high gigabits per second, aims to saturate your Internet link.
So when you as an organization, you know, lease a line and Internet link from a service provider, you subscribe to a certain bandwidth, maybe 10 megabits per second, 100 megabits per second, if you're a larger company, maybe one gigabit per second, 10 gigabits per second, and so on.
And if more traffic, if traffic arrives to your IP address, and it's larger than those amounts, the service provider will just throttle it, rate limit, or even block it.
And then legitimate users that try to connect to your website, for instance, will not be able to.
And then, from the packet perspective, this type of attack can generate much less bits per second.
Because what this attack aims to do is overwhelm your routers, or your appliances, any type of appliances that you have in line that have to process the packets, basically.
So a router, for instance, will allocate a certain amount of CPU and memory for each connection for each packet it has to process.
And if you're able to generate a burst of enough packets, you can overwhelm the router.
And when a packet arrives from a legitimate user, a legitimate client, it won't be able to handle it and process it.
And so on June 21, we saw a packet intensive attack.
Our systems, GateBot, automatically mitigated a highly volumetric and globally distributed attack.
It peaked at 754 million packets per second. This attack was part of a four day campaign.
It started on June 18, and ended on June 21. And just to give you an idea of how distributed it was, the attack traffic was sent from over 316,000 IP addresses towards one Cloudflare, one single Cloudflare IP.
This IP is used mostly by our free, by customers, by Cloudflare customers on our free plan.
Meaning that someone potentially was on our free plan and was targeted by a very large attack and no downtime, service degradation, or any charges occurred to the customer during this time.
And this is all part of our unmetered mitigation guarantee.
Yeah. Yeah, that makes that that's so interesting, though, you talked about all these packets being flooded on to the network links.
And, and a lot of times these routers and all your network infrastructure has to respond to these packets.
And I think that's very typical of SYN floods, right? Is that, is that typical?
Like what kind of attack vectors are we seeing on our network in 2020? Yeah, so we're seeing this quarter alone, we saw 39 different attack vectors on layer three, four.
SYN floods was the most common attack vector that was used. Essentially, for our viewers that are less familiar, in the TCP protocol, when a client establishes a connection, there's a handshake, the TCP handshake, where the client sends a packet with a synchronized flag, short for SYN.
And he's telling the server, hey, I want to connect, I have data to transfer, or some other form of connection.
And then the client responds with a packet with a synchronized acknowledgement flag, meaning I acknowledge your synchronization request.
And then the client responds with an acknowledgement, an app, and then the connection is established, and data can begin to be transmitted.
And for every SYN that a router receives, there's a kind of for every new connection, the router allocates memory and CPU.
And so what happens is that the router responds with a SYN hack, and it waits for a response, and it waits, and it waits, and it waits.
And during this time, if you receive enough SYNs, you can overflow the memory buffer, the CPU usage, and basically crash the device or render it unable to handle legitimate TCP connections.
And what there's different systems that Cloudflare has that mitigates these kinds of attacks?
What what system mitigates SYN floods at Cloudflare?
So we we have three main systems that we use for our DDoS protection.
This is our three pronged approach.
So we have, first of all, GateBot. GateBot is our centralized DDoS mitigation system.
It's, it's a piece of software that lives in our network's core. GateBot receives samples, traffic samples from all of our edge data centers in over 200 locations around the world.
Excuse me. It analyzes them in real time, and sends the mitigations instructions to the edge once it detects an attack.
That's one of the systems.
GateBot is also synchronized with our customers' web servers, their origin web servers.
So if it detects that a web server is in stress, it will also trigger analysis and mitigation.
The second system is called DDoS-D, short for Denial of Service Daemon.
As opposed to GateBot, which lives in our in our network's core, and look searches for those globally volumetric attacks.
DDoS-D is a piece of software that runs in every single server and every one of our edge data centers.
And localize as opposed to it being very central. Exactly. Yes, and DDoS-D is able to autonomously analyze, detect and mitigate at very high frequency and sampling rates in order to detect attacks in the scope of one server or a data center.
And so it's much quicker. And it's able to find even those smaller DDoS attacks that try and hide below the radar.
And our third system, a system that we just recently rolled out in the past few weeks, is called FlowTrackD, which is short for FlowTrack Daemon.
And essentially it's a state machine for TCP connections in a unidirectional flow.
So remember before I described the TCP handshake.
So in a bidirectional routing topology, such as with the Cloudflare's WAF or the Spectrum service, we see both sides of the connection because it's symmetric.
But with Magic Transit, the routing topology is asymmetric, meaning we don't see the entire connection and we couldn't use our existing capabilities to identify those very random and complex TCP -based DDoS attacks.
And by quick pause over there, by asymmetric you mean that the traffic, like the client is trying to reach, let's say, Wikimedia servers, and they send out a request to Wikimedia and Wikimedia's data center might see the request come in.
But the response from Wikimedia does not go over Cloudflare's network, it goes over the Internet and it's direct server return.
So that's why we only see one part of the connection and just the ingress to the data center, but not the egress, correct?
Exactly. Exactly. So we see just half of the connection because the origin server responds directly in a DSR, direct server return.
And so this piece of software that we build, FlowTrackD, is able to classify TCP connections, identify their state, and then it's able to either block, challenge, or allow packets based on their association to a legitimate or non -legitimate TCP flow.
Got it. If I were a customer of Cloudflare, it's great to hear from you all the trends that we're seeing on Cloudflare's network, what's happening in the world, how we're seeing a surge in attacks, how there were 39 different kinds of attack vectors, how the number of smaller attacks are increasing and the scale of larger attacks are also increasing.
But if I'm a customer, where do I get to see all of this besides, say, the DDoS report that I would read from Cloudflare's blog?
Well, good question. Is there a more holistic view of analytics or something that I can see on my dashboard?
So, yes.
For the layer 3, 4 DDoS attacks and traffic analysis and traffic patterns, we have the network analytics dashboard.
This dashboard shows all of the edge traffic that reaches the Cloudflare edge, including all of their packet level attributes, and also all of the DDoS attacks that are mitigated by GateBot and DDoSD automatically, including all of the attack attributes, the attack start and end date, the attack vector, what action was taken, what was the max bit rate, packet rate, the total bits, packets, the geographical distribution, and so on.
That's great.
Would I have to see this while the attack is happening? I guess my question is, how long does it take for Cloudflare to mitigate these different kinds of attacks?
So, most attacks are mitigated under 10 seconds. I'd say that even around three seconds in the majority of the cases, but under 10 seconds for sure.
This is for the automatic systems that analyze traffic patterns in real time and generate rules.
But obviously, for predefined mitigation rules, for firewall rules that are static, whenever they filter, whenever they match across a packet, it's dropped immediately.
So, essentially, in those situations, the time to mitigate is zero seconds.
It's immediate. And by predefined rules, you mean they're typically predefined by a combination of the customer and Cloudflare SOC team, but you would define that on your dashboard?
Exactly. Whenever we onboard a customer, there is an assigned account team with a solution engineer.
This is for enterprise and magic transit customers, of course, a solution engineer, a customer support manager that are assigned to that account.
And during onboarding, and whenever needed, they're constantly refining and optimizing the security settings, the configurations, the firewall rules as needed to make sure that our customers are protected.
One of the things we always hear about in the industry is about two different deployment modes available for customers, right?
One is always on and on demand, always on being where your DDoS mitigation solution is sitting between the customer's network and the outside world at any given time, and on demand being when it's turned on only when the customer turns it on in case of an attack.
What impact does that have on the time to mitigate or any other, like what should a customer pick between always on or on demand?
Okay, well, first of all, we recommend always on, but it depends.
Generally, some customers want to retain full control of their network and turn on DDoS mitigation only in the event of an attack, and that's fine.
But the time to mitigate is longer since the customer still has to turn it on, and it takes a few minutes for the BGP routing to, or advertisements to propagate, and for the rerouting to take place.
However, I think that on demand is a solution to a technical limitation that derives from the small network infrastructures of legacy DDoS providers.
So we have a global network of data centers at Cloudflare.
We're one of the most interconnected networks in the world with over 200 locations around the world, so we don't need to sacrifice our customers, don't need to sacrifice performance for security.
We always recommend always on to customers because that way we can always mitigate any attack instantly, like I said, with no performance penalty on their network at all.
Yeah. Speaking of legacy DDoS mitigation vendors, why is Cloudflare better?
I know you touched on our global network that spans over 200 cities. What other factors make Cloudflare a better DDoS mitigation solution than other legacy vendors in the market?
Well, we've been doing DDoS mitigation since our inception 10 years back, and I mean, of course, I'm biased.
I'm the product manager, right? Don't listen to me, but I would say that the first reason is our capacity of our global network, our capacity and coverage.
We have a global physical presence in over 200 cities in over 100 countries around the world.
Our mitigation capacity is over 37 terabits per second.
That mitigation capacity is higher than the next top four competitors combined, so that's a lot.
And we're able to do this because we don't have a small set of scrubbing centers that legacy providers have.
We do it from every data center, so every one of our data centers provides or delivers all of our services, the WAF, DNS, DDoS, and so on.
So let's pause on that point a little bit. It's almost like, is it fair to say that a network's almost like a fractal where every service, whether it's DDoS mitigation or WAF or people know us for CDN, all of the services are across the stack, run on every single server in every single data center across all of the 200 cities that we have a presence in?
Yes, exactly.
So does that also mean that we don't have, you mentioned that unlike other legacy vendors, we don't have a limited subset of scrubbing centers.
Every one of our data centers is a quote-unquote scrubbing center.
Yes, exactly. So if I'm the customer, any attacks that are originated at any source are mitigated by Cloudflare in the cloud, very close to the source of attack, and it's mitigated fast, and that's what leads to fast performance of traffic.
Yeah, exactly. So it kind of works twofold.
For attacks, we're able to mitigate the attack traffic close to the source, and for the legitimate clients, we're able to serve them close to them, close to the eyeballs.
So we're able to, and we also implement or utilize any cast in our network, meaning that we're able to traffic engineer and spread the attack traffic across our fleet of data centers, and we have automatic mechanisms that do that as well.
So our network is really smart, and it learns from every one of those attacks as well.
That's great. It's almost like our network gets smarter with every bit that flows over it.
Is that right? Yes, and the other reason we're better is performance.
So it's not just the security. Performance and security need to come together, and it's great.
We focus on not just best-in-class security, but also making sure that the connection time is quick, there's no latency.
Our network is here again.
It helps us a lot. Our customers are very sensitive to parameters such as DTFP and so on.
We're building systems in place to ensure our customers' network traffic is not just secure, but also super fast, and no other vendor is well-positioned to do that in the market.
That's great. We're catering to the fundamental needs of any application, really.
If you think about it, it's security, fast performance, so speed, and then reliability.
That sounds like a winner, Omar.
But what about costs? Because now more than ever, our customers are very sensitive to costs.
How do we compare to other box providers? So compared to box providers or hardware appliance providers, you have zero capex with clouds there, capital expenses.
So we offer unmetered DDoS protection and charge a flat fee for our subscription-based services, so we're definitely cost -effective.
And also, we will never charge for DDoS traffic. So with other vendors, you are charged for it, and then you have to submit a ticket requesting the credit and bring the proof.
No, we just don't charge for it. Great. All right, Omar, we're at the top of the hour.
I know you're the product manager for DDoS, but you have me sold.
Thank you so much for talking to me today. It was an absolute pleasure.
Stay safe in London, and I hope to get to see you sometime soon. The pleasure is all mine.
Thank you, Vivek.