Tales from an SE: Under Attack Onboardings
Presented by: Weston Eakman, Ankur Aggarwal, Kabir Sikand
Originally aired on April 9, 2021 @ 11:30 AM - 12:00 PM EDT
Tune in to hear Cloudflare Solutions Engineers discuss their experiences onboarding a customer that was Under Attack from malicious online threats.
English
Security
DDoS
Transcript (Beta)
Thank you everybody for tuning in to Tales from an SCE Under Attack. My name is Weston Eakman.
I'm a solutions engineer here at Cloudflare. I have been here for two and a half years.
Fun fact about myself. I, during quarantine, I just started growing this beard.
What do you think? Does that, is that believable, Ankur?
Yeah, totally. My name is Ankur Aggarwal. I'm also a solutions engineer at Cloudflare.
And today we kind of wanted to go through Some of what we see as as customers come in when they're under attack.
So we actually invited Kabir here with us.
He's also an SCE at Cloudflare to kind of take us through Kind of what we see when customers come in like that.
So Kabir, if you want to introduce yourself. Yeah.
Hi guys, I'm Kabir Sikand. I'm a solutions engineer here on the Cloudflare team.
I've been here for just north of one year now. And definitely have been through in that time, a large number of under attack scenarios.
So we'll definitely be talking a little bit about some of the more interesting ones that we've had over the past year.
Yeah. Awesome. Thank you, Kabir. And we really appreciate you joining us and kind of telling us your story and everything.
But before we really jump into that, I just want to give a little bit about the purpose of this segment.
You know, Cloudflare, we're here to help, you know, our motto is to help build a better Internet.
And in part of that, one of the things that we help with is under attacks.
So to kind of define that a little bit, Ankur, do you mind talking about what an under attack is?
Sure. Yeah. So an under attack is basically just anytime a customer's infrastructure or, you know, your infrastructure is either getting hammered by something malicious or sometimes even not malicious like, you know, he released something new and your infrastructure is either just not able to handle it or even just like filtering out the bad DDoS traffic that's hitting your edge.
So we'll have the under attack available to really anyone to just go ahead and click on I'm under attack on our homepage.
And, you know, from there, we'll take it away internally.
Yeah. And so, you know, talking about that internal Who typically is involved internally at Cloudflare for that, who's who's supporting these under attacks when they come in.
Yeah. So once those under attacks come in, obviously, you know, we assign an account team.
So we'll work with both the account manager and then and also an SE.
So the SE responsibility during those scenarios is basically just to go through, talk to the customer, try to figure out, you know, what is actually occurring.
We'll work with the customer to either identify based on the sort of logs or documentation or information they can provide us and eventually, you know, get at a solution and then also with help implement that solution.
And the key thing with the under attack is like we do all of this in a pretty compressed timeline or just, you know, right when that customer comes in.
Great, great explanation. And, you know, you know, with that, with the SEs coming in as helping bridge that gap between kind of our box, out of the box enterprise solution, the customer's infrastructure.
How, who, who's picked up?
Is it round robin for the SEs to come in and help? What does that look like?
So on the SE side, it's actually kind of amazing. It's, so it's basically by volunteer.
So in a typical under attack scenario, you'll actually have one SE or two raise their hands pretty quickly to join that under attack.
But as time goes on, as other people check the messages, you'll actually have additional SEs either wanting to shadow or to just add or take over if the first SE becomes busy.
So it's a bit of like a badge on the SE team to actually support an under attack.
Yeah, awesome. Awesome. Cool. So, you know, to start actually jumping in, you know, with after this introduction, I want to give a little bit of background more for Kabir, because I think the under attack that he had experienced was kind of within Cloudflare, a little bit of a, it's kind of legendary, because this one took a very, very, very long time.
Pretty much what we call the under attack that happened around the world.
I think it was between what like 36 to 48 hours or something like that, Kabir, right?
Yeah, yeah, it was somewhere in that time frame.
At that time, you know, it was a few months ago. So the amount of exact time definitely gets blurry past hour 36.
But certainly, I wasn't the only person helping out on onboarding this customer and getting them through the attack that they were experiencing.
We actually got to hand this off between folks around the world.
So, see here, I thought this was kind of a superhuman feat on your side.
I kind of want to keep that image of you like that. Yeah, I don't know, maybe.
With sleep deficiency, I did everything. I need more caffeine pills and Red Bull to do that.
For sure, for sure. Well, cool. Um, well, yeah, let's get into a little bit.
Can you kind of describe the industry of the customer, as well as, you know, kind of a semi location.
I want to make sure we're keeping privacy and everything, but just like a kind of simple.
Yeah, without obviously giving away too much information about the customer in question.
This was an attack scenario that was occurring in the EMEA region and it came in in Pacific time, roughly in the afternoon.
So it's pretty late in the day for the folks in EMEA to be working on it.
And, and as you can imagine, you know, when we got the, the under attack scenario coming into to our pipeline.
The, the customer themselves had already likely been up trying to deal with this on their own for a long period of time.
So we'll, we'll definitely jump into a little bit about, you know, what that might look like and what that relationship looks like.
But for us, it started at around like 3 or 4pm Pacific.
I handled it until we had some folks in Singapore wake up and get to their desks and then, you know, obviously overnight that kind of went on and I got it back in my morning from some folks in EMEA.
Amazing. It basically just followed the sun.
With that, when the customer came in, like, did they readily know what issues they were facing or did you guys have to kind of work through, kind of work through some material to get there.
Yeah. Oh.
Kabir, I believe we lost your audio. Yeah. Are you still there, Kabir? I wasn't sure if that was just on my end.
I am here. Can you hear me now? There we go. Yeah, there we go.
Okay, perfect. Sorry, a little blip. So in any case, anytime we have these, these scenarios come up.
Oh, I think we lost you again, Kabir. We lost you again.
The joys of working from home, everybody. Exactly. Awesome.
Awesome. All right. Can you hear me now? Yes. Yes. Great. Switched Wi Fi. We should be good now.
So in any case, whenever we get these coming in, I generally see that there's a, there's like a series of kind of symptoms that almost come into play to understand exactly what the attack is and how we might be able to help it.
So it's kind of like calling into an advice nurse line. Really kind of read what the symptoms are and what types of What types of like attacks, those might line up with.
And also, you know, on the flip side of that, not just on the technical side, we need to really empathize with the with the customer.
They've likely been dealing with this.
They're frustrated. They're hard at work. They're looking, they're reaching out to a third party to come in and and try to help them implement a solution.
And so that really helps guide the conversation and also helps guide the pace of things, too.
Yeah, I think that's, I think that's a sorry.
I'll just real quick. I think that's very important to note too because You know, while we have as you guys kind of mentioned, you know, we have a follow the sun model where we can hand things off these teams aren't very big all the time.
So it's They're before they even come to Cloudflare there.
They've been up trying to understand the problem diagnose it right and then trying to fix it as well.
So they've probably been up for I assume maybe a day or two trying to fix this and then coming to you.
This is 36 to 48 hours. Right. That's a lot of time to to be up and you know that's a superhuman feat, you know, joking about with you.
But I mean, the team that came to us.
They were having to do that. Yeah, exactly. And that's like one of the key pieces to kind of make sure that the relationship is good.
And so, you know, as we get into an attack scenario like this. It's important to realize we need to get As much done as possible to get them to a point where that person can go home and go feed their family get some sleep, come back and and get back to the problem and kind of start to move on from there.
Absolutely. So when they got to you late at night.
You were able to kind of tee off on some information or signals that I was a certain type of attack.
Did that. Did you guys initially from the get go establish some success criteria or a game plan on how you're going to mitigate it.
Basically, could you walk us through some of that actual like solution architecting Yeah, absolutely.
Um, so one of the big problems here in this particular scenario is that they saw a sudden jump in traffic.
They didn't expect that jump in traffic based on any marketing or releases or anything like that, that they were running no promotions.
And as a result, they hadn't capacity plan for for that increase in connections to their servers and so connection limits for being hit.
They weren't able to serve all the traffic coming in.
And so one of the immediate goals is let's try to mitigate some of this a is it attract is it attack traffic and and that was very quickly identified as a yes.
But be since it was, can we mitigate a lot of this and free up some of the connections that they have available on their origins, so that they can actually serve Serve their sites, at least an even a limited capacity, such that they're not like affecting their bottom line too much, or at least minimizing the effect on their bottom line.
And and again like goal number one was to get this, you know, this guy to be able to go home and go to sleep and come back to it with a fresh pair of eyes in the morning.
So basically, when you guys first started, then what kind of what was that first step to get that set up so you could kind of relieve that person.
So then they can come back and pick it up with a one of our other teams.
What was a just Yeah. So one of the out of the box services that Cloudflare provides once you enable our proxy is DDoS attack mitigation.
So we were able to provide that DDoS mitigation on layer seven and all the customer had to do is really switch over to our name servers enable the proxy on the records that he was, he was experiencing the attack on He was then able to utilize our DDoS mitigation out of the box and we enabled the web application firewall as well really just a click of a button and it also enables you to to get in our IP reputation database, you can Really take out any like malicious threat actors that are known out there.
So that really helped, you know, solve some of the immediate problems and we'll see.
We saw an increase in free connections on his origin servers as a result.
Awesome. Awesome. And this was This was just one website or was it several websites.
What was this, what did this look like Yeah, so the attack itself was was occurring across a number of his domains.
So the company themselves managed multiple domains and and some subset of them were being Oh, I think you're breaking up again Kabir Sorry about that.
No worries. So some subset of those of those websites were being attacked.
Gotcha, gotcha. And so this this again we, you know, I keep bringing back as it's just, it's, it's crazy that it took so long.
And so did that.
Yeah, go ahead. I was gonna say, so they're able to get the proxy up, they're able to start practicing traffic through us for the specifics a DNS records that were impacted.
Did that solve all the issues for the customer. Was that it is the did.
Were we able to end it. Did that take a full 36 hours. Yeah, so in an ideal world that is that's all we need to do just turn on the proxy, but you know we were in the real world here and Sometimes, sometimes these attacks will happen because threat actors out there are just port scanning the Internet or they're looking at DNS names.
And they are attacking those and looking for places where they can do attacks like credential stuffing like look at I'm sure you're all aware there are many places in the dark web, you can go and just identify leaked password username combinations by those lists and test them again.
And then, and kind of look at those across the web and see if they're reused And so sometimes when we see an attack.
We'll see some this customer is not really a target of a specific targeted attack.
They're just being hit because they don't have the mitigations in place that they could have In this case, a few hours later, we saw that the attack started again.
This time, not on our edge. What had happened on the second step was The, the, the client in question previously had another DNS service that DNS service did not mask his origin IP address.
There was no proxy. Those DNS records were cached and the the customer or the attacker rather was able to look at those previously cached DNS records and attack the IP addresses directly.
Yeah. And what that does is it bypasses Cloudflares proxy. So, In that case, what, what needs to be done.
Like, what, how does a customer. How do we resolve that Yeah, so what we want to do is really lock down Traffic to force it to go through our proxy.
There's a few ways we can do that. We could tell the customer to lock down the IP addresses that can That can make requests over to their origin server to only use Cloudflare IPs.
We could install daemons or agents on their machines as well.
The other thing that we we often do is when you do onboard and you are under an attack and active attack, especially You can go over to your ISP or service provider and say, hey, can I rotate those IP addresses out and just Post my service on a different set of IPS Cloudflare on our side, we can just change the IP addresses that we're pointing over to the old IPS can get black hole that your ISP or whatever provider, you're using So that it doesn't kind of affect the rest of the the network over there.
And then any of the any of the attack traffic that was previously coming into those IPS is no longer longer going to a valid server.
So you're now able to force all outside traffic to go through Cloudflare apply the mitigations that we have in place and send that traffic back to origin cleanly.
So that was kind of part of the the process as to why this attack kept kind of coming back in different forms.
And that was really step two.
It wasn't necessarily the end of the attack. There's definitely more, you know, attack patterns that we can see So with with this one specifically that you were working on.
It sounds like it was like kind of changing and evolving and attempting to adapt as it got into Cloudflare Was that the case.
Am I kind of seeing that clearly or Yeah, they just happen to be lucky.
Yeah, no, I think what was happening is there's, you know, obviously I can only speculate as to what the threat actor was was looking to do and what their motivations were But based on the attack traffic that we saw it looked to to myself like someone was specifically targeting these domains and these sites.
And they were using more and more difficult but complex methods to attack the site and they started at really basic DOS attacks.
They moved on to direct to IP attacks. They started issuing credential stuffing attacks and Root You're breaking up there could be a moment.
Started morphing Sorry.
Yeah. Can you guys hear me again. Yeah, yeah. Okay, great. So, so I think you may have lost me somewhere in the attack it just to just to kind of wrap that up.
They, the last attack pattern that we saw was a really bought like credential stuffing attack and and what that is, is it's a method of Adding any sort of like username password combinations in some some form of an attack pattern that allows you to identify valid username password combinations.
And then if that's something that was used on another site and now it's used on this site, it's very likely that you can go to a higher target site.
Higher value target like a bank or a social media account and log in with the same credentials and some subset of those will work.
So you're whittling it down and obviously those higher value targets have higher forms of Mitigations in place.
So you want to as a threat actor generally do those do that kind of threat recon on smaller sites that maybe don't have those mitigations and maybe only have one guy on the IT side or instead of a full security team.
So when this customer essentially you guys got them proxying traffic they rolled their ISP IP and then they're experiencing these credential stuffing attacks.
Did we roll out any additional products to help mitigate these or how did we roll these out where they during that under attack scenario that 36 hours or did we come back later to address these because Yeah, so this third part of the attack.
We considered it to be part of the under attack scenario because it was very clearly coming from similar threat actor.
What we did is a We added on.
So maybe just to like kind of dig into what our previous mitigations were they were they were against things like distributed denial of service attacks, where it's very clearly volumetric attack patterns.
That can knock a site over looking for specific threats that are known on the Internet.
I'm looking for vulnerabilities that you might find in a common vulnerabilities database.
And preventing those types of bones from allowing a threat actor to get access to your infrastructure or take it down.
But with a credential stuffing attack often that looks like a normal user coming in.
And so you do need a lot more advanced mitigations to put in place.
They could be in the form of a brute force where there's lots of traffic from single IPS or specific sets of IPS and often that's a threat model where you can kind of just play.
Like a cat and mouse game you plug the hole as as it comes in.
In this case, it was distributed. It looked like it was a fairly advanced script that was being used to kind of mask the traffic and make it look like it was human Cloudflare itself over the past few years has been developing bot mitigation solutions.
And so what those are designed to do is Really learn from the collective intelligence of the traffic that's already flowing through our network and identify what is Automated traffic and not automated traffic.
So in this case, we apply that bot mitigation solution to the customer and and that traffic.
Was able to be mitigated. We saw a much lower rate of folks logging in.
You know, and and definitely mitigated. A lot of the connection pull limits that he was hitting.
So it went back to normal kind of traffic patterns afterwards.
Awesome. So instead of we were able to in eventuality, find a way to react by not just Kind of putting boards over kind of these holes coming into the ship, but be reactive and then proactively in the future reactive Yeah, to get these things.
Yeah, exactly. Getting to like a really good point in that in many attack scenarios like you know as I mentioned early on, you want to kind of empathize with with the person on the other side and get them to go to sleep and then come back the next morning with a fresh Fresh mind.
Yeah. And figure out what's the plan to prevent this from happening in the future.
Like this was only one type of attack pattern.
In this case, that that that planning phase of what is what does the future look like Couldn't be it had to be accelerated because the threat actor kept morphing the attack and coming back and coming back and coming So we accelerated that certainly and we implemented our bot management solution, which is going to help in the long term and has been helping them for over the course of the past year, almost and and this is a pattern that will see the often where Folks will come in and We will mitigate the initial attack and then we can apply solutions and strategies to prevent future attacks.
Touch. Awesome. And then at the end of this, it sounds like we were able to solve the problem and you know they've been on cloud player for like you're saying, almost a year now.
So after this, did you see the customer coming back often for say tunings or anything like that, or was this handled by you or someone else on the team.
How did that work out.
Yeah, so, so, good question there. So kind of going back to the model that we described early on for the period of the attack itself.
We do have a volunteer model for a lot of these customers coming in, because it's going to happen in off hours, it can happen when You know the region that generally serves the customer.
Maybe is offline could be in the middle of the night, you know, in the case of the particular attack that I'm talking about.
It came in, in the afternoon in the Pacific region, which means it was The middle of the night over in a mea where the customer was so after the attack subsided and we we mitigated it immediately.
This actually went over to a solutions team in the media region so that we could better serve them and down the line get meetings that are in the right time zone for both parties.
And in passing it from, you know, region to region. Did you guys do sinks at all, you know, catching up on how Notes during that under attack process and then eventually, you know, settling out with that single account team like was it simply just pinging another colleagues and notes or was it, you know, hopping on a zoom or now.
Yeah, that's a good question. I think, you know, for most of the handoffs that we did.
And we did do a lot for this particular case.
Others are maybe one handoff those handoffs. I prefer for them to be, you know, get on the call with the customer, make sure all the context is there.
If the customers asleep and we're still kind of helping behind the scenes. Maybe I'll just sync up with that solutions engineering that different region, you know, on a time that works for us.
But in general, it's really useful to have the customer on the line.
So any gaps that you know I might misinterpret anything That we're not like we're not playing telephone.
And we won't, you know, kind of transform that information to something else.
We want to make sure everything's clear and and everyone's in sync and what's going on.
Awesome. Awesome.
And something you had kind of mentioned, I kind of want to put another plugin for someone else for the club or TV.
Is we kept talking about bot management and, you know, I'm sure that kind of piqued a few people's interest.
If you guys want to learn anything more about that.
I believe there's gonna be three sessions.
I only have one at the top of my head for the time is Calvin Shirley, who's the bot management SME the subject matter expert.
He's going to be giving a session at 6pm on Wednesday, trying to understand and you can learn more about what that Bot product looks like with us.
I know it's been mentioned a lot. It's great able to, you know, help us identify and mitigate These intelligent attacks that come in and which definitely came into came handy.
And when Kabir was helping out with this specific instance.
Well, awesome. Great. And so I think I'm not sure if you had answered this question.
I was a bit curious. Apologize if you had said this before, but in terms of kind of the team members on the customer side.
Was it just kind of the security team that was involved.
Who was there. Yeah, so You know, we, we have a when we have under attacks come in.
The companies that come in, needing help are Are of all shapes and sizes.
And so as a result, you might get someone who is a consultant for a small organization that they've worked with before that's helping out with this On the flip side, you might get a whole security team that has, you know, built out solutions and they have You know, they're, they're some of the brightest minds and security.
On the phone with you, trying to figure out how to how to mitigate this attack and what solutions are in place.
And so the conversations are very different between each of those.
In this case, it was just a single, single person.
One of the only it folks on that on that team.
And, and, you know, as a result was definitely a little bit tired. But we were able to solve that.
So in any case, I think we're coming towards the top of the hour here.
Yes, yes, yes. Oh, yeah. Yeah. So thank you. I'm gonna go ahead and kind of announce the next one.
Kabir, thank you so much for, you know, telling your story kind of describing what the process was what kind of the nitty gritty look like kind of passing around the under attack that happened around the world.
Right. Appreciate everybody for tuning in your co host here, your hosts here myself Weston Ekman and Ankur Next, please stay and listen to Candice as she speaks about on the spotlight on Latino excellence.
She's great. I always love listening to her speak and she's very insightful and very intelligent.
Alright, everybody. You guys have a great day and Awesome.