SECURITY SPOTLIGHT - Institutionalizing Incident Response
Presented by: Joe Sullivan, Arun Singh
Originally aired on August 17, 2020 @ 12:30 PM - 1:00 PM EDT
The 3 keys for incident response are risk reduction, crisis preparation and communication. Watch @Joe Sullivan, CSO of Cloudflare, share his thoughts on how security leaders can implement incident response in their organizations.
English
Security Spotlight
Transcript (Beta)
Hello, everyone. Thank you for joining us today. I'm Arun Singh, Security Product Marketing Lead here at Cloudflare.
And our topic for today is how to institutionalize incident response for your organization.
And I'm very glad to be joined today with Chief Security Officer of Cloudflare, Joe Sullivan, with me.
Joe, thank you for joining us.
Thank you for inviting me here. Joe, so before we get started with our discussion today, and we go into the depth of how do we institutionalize incident response, we need to embrace that we live in a world where security incidents are a reality of everyday life.
And when we take a step back and we look at the broader picture, and look at the broad strokes, what are the important aspects that security leaders should consider when they think about security incident response plans and things like that?
Sure. I think that as a security leader, there are three components to incident response that I need to remember every day.
Number one, it's my job to get my company ready to respond to security incidents and to actually help the company respond to them.
Number two, I always try and remember as part of my planning with my team, that we don't just wait for an incident to happen, we prepare to respond to incidents.
And number three, I prioritize communication.
I believe that most teams and companies are judged on how they respond to an incident by how well they communicate it.
Definitely, communication is so key, extremely key incident during crisis, now more than ever.
But let's start the discussion by talking about the first two things that you mentioned about, which is identifying risk, mitigating them.
And the second part, more importantly, preparedness.
So how do security leaders go about thinking that how do we identify risk and how do we prepare for them?
Sure. The danger in our job is that we get so busy kind of quantifying risk and buying products and implementing solutions to reduce risk that we forget to prepare for crisis, that we forget to build kind of like the institutional strength about incident response.
And I think it's easy to do that because, well, there aren't a lot of products and solutions that you can buy to help you respond well to an incident.
If you want to quantify risk, there are lots of tools out there.
If you want to reduce risk, there are lots of security products out there.
But if you want to build an institution that is strong in responding to incidents, it's just going to take hard work on your team and on your part.
From the perspective of being prepared, how do we go about making sure that now that we have identified risk, we have tools to do that, mitigate it, but now is the other aspect, which is preparing for the crisis.
How do we start that onset of that thought process?
Yeah, for me, it's funny. I remember the first time I was managing a team in a security organization, and we were responsible for responding to incidents and doing investigations.
And it felt like we were always in reactive mode.
We were never doing planning. It was just like every day another incident showed up, and we weren't doing a great job of triaging or prioritizing, and so we just kind of treated everyone the same.
And we put a lot of effort into each one, and you kind of fell further and further behind.
And so I was talking to one of the executives at my company, and I relayed this feeling of always being on our back foot and responding over and over again reactively.
And he looked at me and he said, it sounds like you need to build a fire station.
And I thought about it, and I actually had had an experience of going to a fire station not too long before that as a chaperone for a school trip.
And if anyone of you has ever been to a fire station, you know that even if one of the engines is out responding to a fire, there'll be other engines in the station.
There'll be other firefighters who are asleep because they're on a different cycle.
There'll be other firefighters who are cleaning and polishing the equipment and getting it ready for the next fire.
The fire station is built for resilience. It's built to handle different types of fires, a different level of volumes of fires, any time of day, and any type of situation.
And that's what we need to have as our mentality when we build out a crisis response, incident response team inside a security organization.
We need to make sure that we have the people, the tools, and the processes ready for the different types of incidents that might come at us.
Little did you know that a small fire station trip could become so reflective.
So you mentioned about, you look at the fire department and they're always looking for things that are known.
For example, there's a season where there's a high risk of fire and they plan for that.
And then there are unknowns. So if you talk about the knowns and we draw that parallel to the security industry, what are some of the known aspects that security teams should worry about?
It's easy to kind of make a list of the things that could go wrong.
It's hard to capture all of them and to articulate which one's most likely to happen.
On the list of things that are likely to happen, a vulnerability in your code, an employee's account getting compromised through a phishing attack, or even multi-factor authentication compromised, or somebody getting through your firewalls, someone launching a DDoS attack against you.
You can think of a lot of different ways that you could get attacked.
What you can't always anticipate is what will happen as part of the attack.
It might not just knock you offline, but it might also take away your ability to communicate internally inside the company.
It might not just lead to an employee getting their account compromised.
The attacker might move laterally into a different part of the company and you might not detect it for weeks or months.
Then you have a lot of different things to respond to all at once.
It's good to focus on kind of trying to predict the bad things that could happen, but it's also important to prepare for eventualities that you might not have expected.
Talking about the eventualities that you might not have expected, taking into account the present times where the world is facing the brunt of a pandemic, coronavirus, COVID-19, do you really think that security teams and organizations around the world had anticipated and prepared themselves for such a crisis of a pandemic?
I think every mature organization has some form of a business continuity and crisis response plan in place and has done some kind of exercise to anticipate a major crisis like this.
Probably not many anticipated this specific crisis, and there are certainly some curveballs that have come at us that we couldn't, even if we had anticipated it, actually kind of stress tested against it.
So there are definitely some unique things here. I'll give you two examples. I think one is just the sheer volume of our workforce moving out of the office and into working remotely from homes.
As a security leader, I've had to deal with situations like that in the past, but never at this scale for this type of duration.
So an electrical outage or an earthquake could disrupt an office and a team for a shorter period of time, but we're looking at moving workforces remote for weeks and months potentially.
And so that puts a completely different level of stress.
And I doubt that any of us anticipated the demands of family members needing access to technology, the lack of monitors and comfortable desks and chairs and all the things you need to get for your employees to get them to work productively and safely from different environments.
And as a result, we've seen some interesting challenges in terms of employees taking matters into their own hands by getting old computers out of the closet or basement and not really thinking about the fact that the software is very outdated and risky or going and downloading and trying new different types of video conferencing tools that maybe haven't been reviewed by the company.
So lots of challenges are coming at teams, no matter how well they prepared for this particular crisis.
And I mentioned that there's really a second aspect about this incident or crisis situation that's fairly unique, and that is the way it has unfolded at different times around the world.
We work at a company that has offices globally, and so we've had the opportunity to see this crisis unfolding at different times in different regions.
And there's obviously good and bad in that.
The good is that we've learned a lot of lessons from our team in Beijing and the team in Singapore in terms of what worked and didn't work as we were helping them respond, and we were able to apply those lessons.
To a certain extent, that's also been good for morale as we see that this crisis, like any others, does have a beginning and a middle and an end.
And so even though this one is extending out over a very different kind of period of time than most other crisis situations, you can still see a light at the end of the tunnel, if you will.
So the plans are there, security organizations work through it, a curveball hits, and there's pivots that need to be made.
As you mentioned, for this particular crisis, the fact that the previous crisis were like, they happen, there's a point in time, and this one is more elongated.
So going back to the aspect that the core piece is to make sure that you have a solid plan in place, and you're practicing that, and you're making sure all the stakeholders are involved in it.
If I take that, and the fact that we have to deal as security organizations with knowns and unknowns and pivot quickly, the situation stays fluid.
What are some of the salient aspects of a response plan that security teams and leaders should think about?
There are a few things that you always need to have in place.
And what has been, I guess, a positive thing for me is seeing how that list of things that I would say are important for responding to a security incident have been valuable in responding to the COVID-19 situation, which is a very different type of crisis.
And so, for example, good case management tools, the ability to capture decisions in real time are important in every crisis situation.
Having a cross-functional group of people who are designated to come together and work on the response plan together.
Every incidents response plan in the security context typically contemplates the different parts of the security team working with the legal team and the communications team and other teams at the company.
And the same thing is happening in the context of business continuity and COVID-19 response.
There are so many different competing interests and priorities that you need to have a strong cross-functional group using good tools for collaboration and communication.
I mentioned in a prior security incident, we found the need for out-of-band communication.
And that's certainly been the case here as a lot of the communication platforms we use saw incredible stress out of the gates as we started.
In any security incident response, you need to make sure that you have good forensic and investigative skills.
You need to be able to collect data and understand what's actually happening on the ground in any crisis response.
And in the security world, that's typically, you know, that you have good logging and good tools for doing the analysis.
A lot of the time in crisis response, you need to have third parties standing by and available to augment your staff.
One pro tip is don't wait until the crisis to get those organizations under contract.
You'll pay twice as much and get half the quality.
And so if you have good cyber insurance policies, sometimes they'll even specify who you would use and give you a pool of money for that.
So make sure you understand your cyber insurance policies.
Make sure you have the tools and the logging so that when you bring in the third parties to help you or your team to do it yourself, that there's actually something that you can look at.
And then there are lots of little things that matter in responding to a crisis.
You need dedicated conference rooms, dedicated communication platforms.
And then as a leader, I frequently have to remind people to clear their schedules and refocus.
When a crisis happens, it's not business as usual.
And some things have to be deprioritized as the response takes precedence.
And then last of all, in a lot of crisis situations, relationships matter.
No company and no team works in a vacuum.
And so being able to reach out and have connections in law enforcement, peer companies going through similar situations, other companies or organizations that might have information that would be helpful for you in your response, you need to have a relationship before the incident happens.
And so that's another thing you need to invest in.
Great.
So those are some of the key aspects. So what I'm hearing up to now from you, Joe, is identify, prepare, prepare, prepare.
And then you touched upon a very, very key part, which was at the start of the conversation today, which was around communication.
And so let's expand on that a little bit more. Share with us your thoughts around communication.
Why is it important? What should we be focused on?
Yeah, I just think communication is how we're judged at the end of the day.
It's so funny. When I think about all the different security incidents and company security incidents that I've heard about from the outside, maybe because I'm in the security field, I wonder things like, I wonder how many people were on their security team, or I wonder what security solutions they had in place to try and prevent this risk.
But when you see the company and the team judged in the media, those are rarely the topics discussed.
Companies are more frequently judged based on how they responded from a communication standpoint.
What really matters is showing empathy for the customer and building trust with the audience.
And so we're all judged every day based on how we communicate. And I think for us as security team members or security leaders, there are a few things we can do to make sure that our company and we are doing a good job communicating when the time comes.
Here's a couple of them. Number one is get good at delivering bad news.
We all as humans have a tendency to sugarcoat things. And so when we're first responding to a crisis, some of us will kind of soften the blow of what's happening on because we don't want to upset other people.
We don't want them to get angry at us.
Whatever is going on under the covers, there's just a tendency in some people to under deliver bad news or to not deliver it very clearly.
And then other people have the opposite.
They have a tendency to freak out and declare the sky is falling every time the smallest thing happens, which then people stop listening to them.
So you could really undersell or oversell. And it's really most important for us to be really crisp, technical, detailed and clear and not kind of go to either extreme.
Second part of getting ready for communication in a crisis situation is building a trust relationship with the other people you're going to be in that crisis with.
At times of crisis, we feel more stress. We communicate more tersely.
We question other people more. And if you have a pre -existing relationship, then you're going to be able to work through those things much more easily.
And that means that as security leaders, it's important for us to have bridges with our communications and legal teams in place before the incident happens.
In addition, when we're doing communication, it's also important to think about all of the different audiences for communication.
How I communicate with my team is going to be different from how I'm going to talk with that cross -functional group of leaders is going to be different than how we talk to outside regulators and law enforcement or the media, the board, the broader employee base of the company.
There are so many different audiences, and don't forget your customers, who all are interested in what's going on and how does it impact them?
What can they do to help? How can they protect themselves? And so the message is different for each audience.
As the security team, you don't really get to decide the message beyond the inside the company part, but you can certainly make sure that you have the right information to help the rest of the company decide how to communicate about the particular situation.
So, so many elements of identifying, preparing, communicating.
Are the learnings from a past crisis or past incident that comes to your mind that you would like to share today?
One of the best examples of a crisis response that I've seen recently was how we at Cloudflare responded to a major outage last summer.
I happened to be in our London office last June on a day when one of our web application firewall team executed a rule in test mode that inadvertently ended up taking down all of our services for our customers and had a really negative impact on the Internet.
What was interesting to me was not just that we had the outage, but how we responded and how it impacted trust with our customers.
Obviously, you never want to have a crisis like that, but if you do, you want to make sure you're prepared for it and you can respond well in all the different ways we've been talking about today.
And in this particular crisis, I saw our practice and our tools and our processes and our people come together in a way that was very effective in responding.
First of all, there was something really unique about this crisis.
It took place during the day in London when our entire team at headquarters was home asleep.
And so we had to take down our service and bring it back up across the globe with a small percentage of our overall organization participating and nobody from headquarters.
And second of all, it was very visible to the world as it happened because when our service is down, our customers' websites are not accessible.
So as a result, we had a lot of pressure at a very unique time and we had to respond.
And we did all the things that we've been talking about today.
We got the right people in a conference room together.
We brought together a cross-functional group. We had people with different responsibilities, kind of compartmentalized note-taking, decision -making, technical analysis, customer communication, et cetera.
And we had other challenges that came up during that period of time, like one of our internal communication platforms went down as part of the outage.
And so we had to use backup communication. And we were getting inbound questions constantly and had to figure out what to say while we were still trying to figure it all out.
The good news is we were able to come out of that crisis fairly quickly by making good decisions under pressure.
And we were able to almost instantly start connecting with our customers and explaining what had happened.
So we went from being in one room together, kind of working it all out, to splitting up into a bunch of different conference rooms, reaching out to customers, making ourselves available for phone calls with anyone who needed technical assistance or an explanation.
We even committed to putting up together a postmortem and publishing it, which we did about a week later.
That postmortem was incredibly detailed, down to the specific item that caused the outage, what we did in real time as we were responding, and what we were committing to do to ensure that it never happened again.
And the fascinating thing to me and the lesson for all of us is that because we chose to be transparent and went into so much detail, we brought great goodwill with our customers and the people who are watching us outside the company.
Our approach probably bought us more trust after the incident than we had before the incident, and might have ever obtained if we didn't have an incident at all.
Very fascinating. And as you often say, I've heard you say this a lot, never let a crisis go to waste.
It's always a learning opportunity. Learn from that so you can be better in the future.
And I think that is one of the key takeaways that I've had talking to you.
Yeah, for sure. You should always, after an incident, not too long after, put together a detailed postmortem.
Whether you choose to publish it on the Internet for the world to see or not, the exercise is critical.
It has to be done in a nonjudgmental way that gets to the bottom of things.
I've seen this done well at a couple of the companies I've worked at where you want to let maybe a week go by for the emotion to subside, but not so much time go by that you start to forget the important things.
You always want to be documenting decisions in real time so that you can go back and look at them later and analyze.
Hindsight's always 20 -20, and so you can then use that hindsight to learn from that crisis and get yourself more ready for the next one.
Absolutely.
So thank you, Joe, for your insights today and for sharing your experiences, which I believe will be very valuable for security leaders around the world as they prepare themselves and their teams to work on crises that are there today as well as in the future.
And thank you to everyone for sharing your time with us on this video.
We hope to see you soon as we bring to you another exciting CISO conversation.
Until then, thank you for watching. Thank you. Thank you.
Hi, we're Cloudflare.
We're building one of the world's largest global cloud networks to help make the Internet faster, more secure, and more reliable.
Meet our customer, BookMyShow.
They've become India's largest ticketing platform thanks to its commitment to the customer experience and technological innovation.
We are primarily a ticketing company.
The numbers are really big. We have more than 60 million customers who are registered with us.
We're on 5 billion screen views every month, 200 million tickets over the year.
We think about what is the best for the customer.
If we do not handle customers' experience well, then they are not going to come back again.
And BookMyShow is all about providing that experience.
As BookMyShow grew, so did the security threats it faced. That's when it turned to Cloudflare.
From a security point of view, we use more or less all the products and features that Cloudflare has.
Cloudflare today plays the first level of defense for us.
One of the most interesting and aha moments was when we actually got a DDoS and we were seeing traffic burst up to 50 gigabits per second, 50 GB per second.
Usually, we would go into panic mode and get downtime, but then all we got was an alert and then we just checked it out and then we didn't have to do anything.
We just sat there, looked at the traffic peak, and then being controlled.
It just took less than a minute for Cloudflare to kind of start blocking that traffic.
Without Cloudflare, we wouldn't have been able to easily manage this because even our data center level, that's the kind of pipe, you know, is not easily available.
We started for Cloudflare for security and I think that was the aha moment.
We actually get more sleep now because a lot of the operational overhead is reduced.
With the attacks safely mitigated, BookMyShow found more ways to harness Cloudflare for better security, performance, and operational efficiency.
Once we came on board on the platform, we started seeing the advantage of the other functionalities and features.
It was really, really easy to implement HTTP2 when we decided to move towards that.
Cloudflare Workers, which is the, you know, computing at the edge, we can move that business logic that we have written custom for our applications at the Cloudflare edge level.
One of the most interesting things we liked about Cloudflare was everything can be done by the API, which makes almost zero manual work.
That helps my team a lot because they don't really have to worry about what they're running because they can see, they can run the test, and then they know they're not going to break anything.
Our teams have been, you know, able to manage Cloudflare on their own for more or less anything and everything.
Cloudflare also empowers BookMyShow to manage its traffic across a complex, highly performant global infrastructure.
We are running on not only hybrid, we are running on hybrid and multi -cloud strategy.
Cloudflare is the entry point for our customers.
Whether it is a cloud in the back end or it is our own data center in the back end, Cloudflare is always the first point of contact.
We do load balancing as well as we have multiple data centers running.
Data center selection happens on Cloudflare.
It also gives us fine grain control on how much traffic we can push to which data center depending upon what, you know, is happening in that data center and what is the capacity of the data center.
We believe that, you know, our applications and our data centers should be closest to the customers.
Cloudflare just provides us the right tools to do that. With Cloudflare, BookMyShow has been able to improve its security, performance, reliability, and operational efficiency.
With customers like BookMyShow and over 20 million other domains that trust Cloudflare with their security and performance, we're making the Internet fast, secure, and reliable for everyone.
Cloudflare, helping build a better Internet.