HTTP/2 Rapid Reset vulnerability results
Presented by: Michiel Appelman
Originally aired on March 11 @ 8:00 AM - 8:30 AM EDT
Details coming soon!
English
Transcript (Beta)
Okay, we're going to start slowly. Good morning everyone, thank you for joining us today for this Dutch session to talk about the HTTP2 Rapid Reset vulnerability that Cloudflare made public last week, together with other branch partners, now known as CVE 4.4.4.8.7.
Today we're going to talk about what we've seen, what happened, how we solved it, and what we could do to protect against HTTP2 Rapid Reset vulnerability.
Feel free to ask questions in the chat or in the Q&A. I will pause during the session to see if there are any questions.
Feel free to ask questions in the chat or in the Q&A.
I will pause during the session to see if there are any questions.
Feel free to ask questions in the chat or in the Q&A.
I will pause during the session to see if there are any questions.
I will pause during the session to see if there are any questions. Cloudflare has been working on mitigating an attack on our network in recent weeks.
Based on those mitigation strategies and the things we learned from them, we found out, together with Google and AWS, that there is an implication in the HTTP2 protocol which was not thought of at the time and which means that a large part of the implementations of the HTTP2 protocol are vulnerable.
This means that almost every DDoS mitigation, delivery and web service software with support for HTTP was vulnerable towards the end of August.
Of course, many patches have already been released, but there is still a long tail of web services that are still vulnerable.
The result of this is that there has been a large number of DDoS attacks on Cloudflare, of which the largest we have seen is 201 million requests per second, which is three times larger than the record of February 2023 of 71 million requests per second.
In the past few months, we have seen 184 attacks that were larger than those 71 million requests per second and 107 attacks that were larger than 100 million requests per second.
This is a significant impact and also a clear consequence of this vulnerability in the HTTP2 protocol.
What are the timelines that we have seen?
At the end of August 2023, we found out that this vulnerability was there.
At the same time, we contacted other suppliers and together with Amazon and Google, we took the lead to disclose this vulnerability responsibly.
So we went to other suppliers, developers of web applications and web servers, to ensure that they could release their patches before October 10, or at least be prepared for it.
We also worked with them to share real-time statistics to show that this is the impact, this is what we see, all of course with the aim of ensuring that the Internet as a whole can withstand these kinds of attacks and thus make the Internet better.
The HTTP2 protocol then, to take a step back, what is it really?
Maybe a very basic concept, but in the meantime it is more than a decade old.
Two-thirds of the Internet is about HTTP2.
In any case, that is what Cloudflare sees. There are a lot of improvements compared to HTTP1, which can make the Internet faster.
But one of the improvements is that, because a lot of parallelization has been added, the vulnerability has actually emerged.
HTTP1 does not have that parallelization and that kind of thing, so it is indeed not vulnerable.
HTTP3 perhaps does, but we have not yet seen any active exploits of it.
It really depends on the implementation in this case.
So not only the protocol itself, HTTP2 or HTTP3, but also the implementation of it.
How do they deal with those fast resets that are sent by the client to reset streams?
How does it work?
As I said, HTTP1 has little parallelization. So what we saw there is that a request is sent, a response comes in from the server, and actually it starts all over again.
So a new request is sent, a new response, a new request, a new response.
With HTTP2, there is much more multiplexing and parallelization going on.
This means that multiple requests can be sent over the same connection.
Multiple responses can also be sent. And ultimately, as a result, or as a goal, of course, to be able to send data faster from one end to the other.
But that also means that a lot of requests can be sent at the same time, but can also be cancelled at the same time.
And that is what the rapid reset tactic actually applies.
So request, cancel, request, cancel, again and again. And the result of this is that the server is actually constantly processing, setting up a new session, which is then cancelled again.
In some cases, this is actually not so bad.
In the sense that the server can deal with it by saying, okay, I see a request and I cancel it while I'm working on it.
But in some cases, the server is at the beginning of a long line of other servers that are behind it.
So that can mean that the server that receives a request for a new session is going to set up a session at the back after another server where a request should come from.
Then you actually have a memory to allocate, a new session is set up at the back, and then you get a cancellation again.
So that makes it very cheap for the attacker.
You don't have to do much. Send a lot of requests, then a lot of cancellations.
It doesn't listen at all to what the server sends back. But the server on its side is very busy dealing with all the requests, setting up a session at the back, and then it gets a cancellation that also sends it to the back.
That actually ensures that this is not only an exploit, but also a real amplification attack, because the resources on the server can be exploited very quickly.
The trick that can be added in HTTP 2 is that a lot of those requests can be sent in one IP package.
That means that in a 1500 -byte IP package 47 of those messages can be sent.
So a GET for a new request, a new session, and a CANCEL right after that. That means that with one package, actually already 47 of those requests can be set up.
And if you do that often enough, then of course it ensures that in a very short period of time you can set up a lot of those sessions to the backend.
So here on the left you see an overview of a PCAP in which we see that in the same IP package several streams are set up, with the stream ID 3 in this case, in which a GET is set up, immediately after that a CANCEL, and below that the next one immediately follows.
And those are pure pieces of 18 bytes, and then 30 bytes, and that's it.
So then it goes pretty fast. That also means that there is actually a very small ramp-up to execute this kind of DDoS attacks.
There is not a ramp-up where you say, I'm going to see how much I can send, and then I'm going to go over that.
No, the client actually starts full pull with the sending of this kind of requests.
It is important to know that not so much our clients have been attacked, but the Cloudflare has been attacked with this attack.
That means that for the attackers, their incentive was actually to ensure that the weapon they had built, this new botnet, or this botnet was already there, but this new exploit, the vulnerability, they actually wanted to bring to the market.
And how they bring it to the market, is by showing that they can handle the biggest one.
And that is Cloudflare in this case.
So Cloudflare was attacked at the end of August, and initially we assumed that this would be one or the other script that has a very large botnet, and wants to show, okay, look what I can do, and then be gone again.
But that was not the case.
So from the end of August, Cloudflare has been constantly working on a cat-and-mouse game with these attackers.
There was really something else going on. That also meant that for those attackers, at some point it was a bragging right, it was really a way to hype them up, or to advertise their product, that Cloudflare was working on this.
So we actually kept that under the radar, to show, okay, we're working on it, impact is minimal, we'll show that later.
And Cloudflare has been constantly updating mitigation instructions.
Those mitigation instructions, we made sure every time that the line that goes down there, to mitigate, there have been peaks every now and then, in which those attackers have just made another adjustment.
An adjustment to their tactics, in which they attacked in a different way, or they had just made a bigger switch to a completely different IP address range, things like that.
That IP address range, that's an interesting one.
Because what we see is that actually, here they are colored with a kind of cohort, so the blue bottom baseline that you see here, that's actually where the attackers started.
And on top of that, they actually added pieces of new botnets.
So you see more and more new IP ranges being added, which creates a huge peak.
And after that peak, it actually ends pretty quickly.
And what that means for Cloudflare, is that the mitigation rules that we want to apply, we also want to limit as much as possible.
Because every time, for example, we would add a mitigation rule to the IP address, that would mean that, and we've been doing this for a long time, that would mean that it could have a negative impact on those who make legitimate use of the IP address.
So what we want to make sure of is that we find a balance between mitigating certain attacks, we'll show you how that works later, and the negative impact that follows for legitimate users.
What we mainly saw as the source of these attacks, were proxy servers and various endpoints, which mainly ran in cloud environments.
And from those IP addresses, the Internet could pretty easily access them, and pretty quickly send large amounts of HTTPS traffic.
And immediately afterwards, it continued.
I'll stop for a moment to see if there are any questions.
Indeed, I see your comment, Sander, someone has to be the first to come by.
And then you can indeed take a measure.
And that is actually the point of the previous slide, Cloudflare is actually the one, if you can get Cloudflare as an attacker, then of course it is worth a lot.
And that also means that first of all, these new attacks go to companies like Cloudflare to see if it works.
And if it works, it can be brought to the market immediately.
If you do that on a smaller scale, then of course that has a lot less effect.
And it is also a lot less worth it. So indeed, in this case, it is a blessing and a curse that Cloudflare is so big in this way in the digital speed migration market.
Where does the traffic come from?
As I said, a lot of proxy servers and different endpoints, part of different botnets, mostly from the US.
A lot from India and a part from Germany.
But actually quite a lot distributed. So despite the fact that a lot came from the US, there was actually quite a lot of proxying by that kind of IP addresses.
So actually where it came from, quite distributed. How Cloudflare solved it, we'll come back to that later.
But just to show what can be done now to ensure that the environments of our customers are not vulnerable, of course.
First of all, it is important to ensure that layer 7 attack traffic is blocked before it arrives at the application service on location or in the cloud or wherever.
And that actually means that the deployment of a web application firewall or a DDoS mitigation strategy can also be deployed best outside that environment.
This ensures that the impact remains contained within the cloud environment that the mitigation can apply.
And that also ensures that the great impact and the great performance impact that could potentially take place, that it actually takes place there.
It also ensures that there is a lot of insight, as Cloudflare has seen, in the traffic, what is happening, which mitigation strategies we can apply and how fast we can do that.
We'll come back to that later. In addition, it is also important to ensure that between different cloud providers, if different cloud providers are in use, that there is a consistent DDoS mitigation control plane and hopefully also a data plane available to ensure that every application, wherever, on any cloud provider, has a safe strategy against DDoS attacks.
Of course, if there are still systems that should be publicly available from your own environment, make sure that they have the right patches.
But also with these patches, it is still important to make sure that there is, to do an analysis of what happens if too much traffic is sent to those servers.
The fact is that the HTTP2 Rapid Reset attack is an amplification attack in the field of DDoS.
This means that a much smaller botnet is needed to get a certain web server offline.
At the same time, such an advanced botnet is not needed in most cases to get the typical web server offline, since the typical web server can also be eliminated with less advanced attacks.
As the last rescue tool, specifically for HTTP2 and Rapid Reset, the HTTP2 and HTTP3, that should really be used as the last rescue tool.
But the interesting thing about this is that a lot of detail is given on the Cloudflare blog about the mitigation strategies that Cloudflare has applied.
One of them is actually the elimination of HTTP2. So Cloudflare has in its DDoS strategy, a mitigation strategy, also a possibility to implement an IP jail.
An IP jail actually means that I go to a certain PDS, of which I see that it does a lot of negative things at the moment.
I put it in a jail, so that within a certain domain, it can no longer send traffic there.
That worked up to a certain height, also in this attack, only since there was not a certain client, not a certain domain, which was part of an attack, but Cloudflare as a whole, it actually moved the attacks pretty quickly back to another domain.
And so Cloudflare has expanded the IP jail principle.
In other words, if this user, or this IP address, sends so much traffic again with the signature for HTTP2 Rapid Reset, what we actually do is, we prohibit from that moment on, for that IP address, for a short time, that it can make connections with Cloudflare about HTTP2 or 3.
And that means that it is sent back to HTTP1.1.
From that moment on, the Rapid Reset attack is actually mitigated.
But at the same time, that also means that if there are still users behind that IP address, who legitimately still have to have a connection with Cloudflare, then that can still continue.
Then it's about HTTP1, so the performance is a bit impacted.
And that's why we also make the balance between how long that IP address stays in the jail, or how we make sure that they are out of the botnet and that they can use HTTP2 and HTTP3 again.
Long story short, if there are applications that are behind Cloudflare, and in this case behind Cloudflare means proxied by Cloudflare, so not only the user of Cloudflare DNS, but also in the Cloudflare dashboard, that little orange cloud behind a DNS bracket where it says proxy, then that is the application that is protected against Rapid Reset.
And all kinds of other DDoS attacks, of course. If you need more information, then Cloudflare has a special page on the website.
That's Cloudflare.com.h2 There are links to the standard webinar where these slides actually come from, but also a very interesting overview blog from our Chief Security Officer.
That deep dive blog, which really goes deep into how this attack works.
And also the IP jail functionality that we have added, for example.
There is also a link to the CVE report. The CVE report also contains links to other suppliers of hardware, appliances and cloud solutions for DDoS and WAF functionalities.
If there are questions from your side to see if you are impacted by this, go to the CVE report first to see if my vendor is in the middle and what recommendations have been made.
And of course, if you need support, it can also be done from the website Cloudflare.com.h2 And if there are other questions, don't hesitate to contact me via email or LinkedIn.
My email address is here. The QR code also goes to my LinkedIn, but you can also find me there by searching for Michiel Appelman.
That was actually the update I wanted to give. I have not received any questions in the Q&A or in the chat, but of course I am also open to taking them here.
I don't know if you can unmute yourself, but if so, feel free. I would like to thank everyone for their time.
Again, this webinar has been recorded.
We will also share the link with everyone who has signed up and in another way also share via social media and via the account managers if necessary.
Thank you all.
Oh, I still see a question from Niels. How do you know that the attack is targeted at Cloudflare and not its customers?
That is actually because the attackers switched so easily between different domains that it didn't really matter to them which domain they wanted to attack behind Cloudflare or which one they could bring online.
Purely that it was Cloudflare. Thank you Niels for being there.
Thank you everyone for being there to listen and if there are any questions, we would love to hear them.