🔒 Security Week Product Discussion: Automated Abuse Mitigation
Presented by: Sergi Isasi, Ben Solomon, Thomas Vissers
Originally aired on October 6, 2021 @ 5:30 PM - 6:30 PM EDT
Join Cloudflare's Product Management team to learn more about the products announced today during Security Week.
Read the blog posts:
- Introducing Super Bot Fight Mode
- Mitigating Bot Attacks against Cloudflare
- Announcing API Abuse Detection
Tune in daily for more Security Week at Cloudflare!
English
Security Week
Product
Transcript (Beta)
And we're on. Hi, everybody. My name is Sergi Isasi. Welcome to Friday of Security Week 2021.
And this is a session where we talk to the engineers and product managers who have built the things that we have announced this week.
They'll tell us what they built, why, how, what attacks it's meant to mitigate.
And let's get started. So first, let's just talk to Ben and Thomas from our bots team.
Let's do a quick intro, guys.
Ben, you first, who you are, what you do for Cloudflare and anything else you want to share about yourself.
Cool. Hi, everyone. My name is Ben Solomon.
I am the product manager for our bots team. I am located in San Francisco, although we're all currently remote right now.
And fun fact about me, I interned actually for Cloudflare a couple of years ago.
I was not an intern on the bots team, but I interned with our SSL team and was working on certificate transparency monitoring.
Great. And Thomas? Hey, everyone. I'm Thomas Vissers. I'm an engineer in the bots team as well.
I'm located in Europe. In a past life, I was an academic researcher, a security researcher, and I was actually always on the lookout on ways to bypass security systems.
So that's my interesting perspective that I bring to Cloudflare today.
Great. And let's go a little bit deeper there.
So Thomas, your title is a systems engineer. We have quite a few of those at Cloudflare.
What's specifically your role as a systems engineer at Cloudflare?
Right. I think I wear both the engineering hat as well as the data scientist hat in my role.
So essentially, what I do for bot management specifically, I try to focus on designing and implementing detection models and detection systems for bots at scale.
It's a very diverse role, but in general, that's what I try to focus on.
And you've spent most of your time, for as long as I've worked with you, on the anomaly detection part of bot management.
Is that right?
Can you talk about that a little bit? Yeah, sure. So anomaly detection is the unsupervised learning component we have for bot management, which means we're not using label data, but we're actually learning what normal looks like for our customers.
And we try to detect anything that deviates from normal and then try to flag that as that is typically something they want to look into further or want to actually block from their sites.
Great. All right. Ben, you are a product manager, which is a pretty sought after role these days, which is kind of an interesting thing.
And you are, as you mentioned, a relatively recent grad. Does the expectation of what you thought you'd be doing as a product manager line up with reality on the bots team?
Well, look, to be honest, I actually had really high expectations.
I heard a lot of good things about being a product manager and getting to work with other teams, and all of those expectations were met.
I've been on the bots team for about a year now, and I really like the fact that I get work with other teams.
Look, Thomas and I are working on a daily basis on engineering projects.
I get to go and talk to designers, then go talk to folks in trust and safety.
There's all these different things that I get to do. And I also, I tend to get antsy if I'm in one place for too long.
And so being able to kind of move around and act as a communicator has made this a really fun job.
So it definitely met my expectations.
All right. Let's jump into it. So a couple of big announcements today, super bot fight mode, some API security stuff, and you guys have worked on the bot management product for a couple of years.
Let's first do a level set and kind of discuss bots in general.
So let's start there, and I'll go to you, Ben.
What's a bot? So a bot in its most simple form is really any system that is making an automated request.
So if you think about requests on the Internet, right?
I may wake up one morning and use my computer to do anything, right? I'll go read an article or I'll check my email or, you know, do whatever.
Anything that I do, I'm clicking a button on my computer and making almost a manual request.
It's something that is human initiated on the Internet.
And for that reason, it looks natural.
It's what a lot of these systems were designed to do. But somewhere along the line, people figured out you can use bots to imitate or perform human actions at scale.
So instead of just clicking on a link online, you can use a bot to click that link a hundred times in a second, right?
Or a thousand times, or do this across many different IP addresses.
So a bot is anything that is carrying out human actions at scale, often doing it not just with volume, but very, very quickly, if that makes sense.
Yeah, it does. So that sounds generally not what web properties want, but are there good bots?
Yeah. So they're both good and bad bots. And I feel like when I presented in this way of like anything happening at scale, the human brain immediately goes to, oh, someone's using this for abuse.
But there are really good uses for this.
And if you think about search engines in particular, like let's think about Google.
When I go to search a term, like if I search the word Cloudflare on Google, if we didn't have bots, Google would have to go out, search the entire Internet looking for that word that I just searched, and then come back to me with an answer, which would take forever if they hadn't already done that, right?
If they hadn't built up their own database of what's on the Internet, and they could just quickly check that database and get back to me.
So Google employs actual bots, good bots on the web, that go out and they do what's called crawling the Internet.
So they'll take a look at everything that's out there.
They'll record all that data. And that way, when I search for a term, they can just take a look at their database.
And so bots help us do that. Because if you sent out a human to go crawl the Internet, it would take them forever.
And look, it's like painting the Golden Gate Bridge.
By the time you go out and you crawl the entire web, you'd have to start all over again.
So it's just not sustainable from a human perspective. And a lot of good bots are out there to help us do those larger tasks very quickly.
Okay.
So then how do you identify, that totally makes sense, but how do you identify intent, a good bot versus a bad one?
So there's a couple of different ways you can do this.
One of them we can get at today, because there's definitely a big product announcement that gets at this.
But for at least some of those good bots, like Google bot, like Slack bot, DuckDuckGo is another search engine.
We keep a list internally of verified or good bots.
And we work with or check public pages for a lot of these good bots so that they can tell us what they're doing on the Internet, and they can identify themselves to us.
So in the case of DuckDuckGo, they publish a list, I believe of IP addresses or other identifying factors for their good bot.
We can check that list and then make sure that we are letting those bots through and that folks who are using Cloudflare aren't being exempted from these services.
Great. Okay. So let's just get to the crux of it then. So what does a customer care about here?
Why would you care about a good bot versus a bad bot? I personally think there are two reasons here.
One is kind of an emotional reason, which is it's really annoying to think about someone using a bot to show up at your website, even if they're not causing harm, right?
It's just, I hate logging into like a dashboard and seeing, you know what, 50% of my site traffic is bots.
Like that's, it's not great if someone's not using your site for its intended purpose.
And that is really frustrating.
But maybe more importantly is the financial impact here.
People don't always realize bots cause a lot of financial harm. So if you run a website where like, you know, forget about the like inventory hoarding and all these terms we hear about, even if you just run a standard website, like a blog or something like that, and you have 30, 40% of your traffic powered by bots, you're spending a lot of money on origin resources, just serving that extra traffic that really isn't helping you in any way.
And so bots can create a lot of financial harm through multiple different approaches to hurt your site.
Okay. Let's go into the attacks.
You've been working with the bot team for a year now. I want to go back real quick.
You said 50%, you know, that could be annoying. Do you know what percent of the Internet is automated traffic?
So we have estimates on this. If you go to Cloudflare Radar, for example, which, which is our site where we publish all sorts of stats about the Internet and different things you can find, I think we'll generally tell you it's about 40% of the Internet.
So 40% of traffic on the Internet comes from bots or automated sources.
We've seen that vary though. So from time to time that will go up closer to 50, it'll go down.
It also depends on what we are really classifying as a bot.
Are we talking about just the good bots? Are we talking about just the bad bots or, or everything?
And so the number you see there is, is sort of an inclusive number where we're tallying up everything and counting all automated sources as bots.
But there are tons of different types of traffic and gray areas there.
So I'd tend to say 40%. And what do they do? You've covered the good bots.
Let's talk about the bad ones. What do, what do attackers use bots for?
All sorts of different things. It's, it's becoming exhausting, the number of use cases that, that these bad actors on the web have come up with.
One, one example I like to point to, just because we've seen this with, with a few of our customers, we recently learned that when, when folks carry out credential stuffing attacks, in other words, they test out login and password pairs.
So they'll, they'll take a bunch of usernames and a bunch of passwords and just stuff them into a login page until they can break into someone's account.
When this happens to banks, in other words, when someone takes over a, a login account for, for a bank, the bank has a responsibility to actually tell the customer whose account was broken into.
They have to send an envelope in the mail to that customer and say, Hey, someone used credential stuffing to log into your account.
And we've done our best to secure the account, but they broke in, right?
They got access to your financial information.
And so in this case, there's a lot of harm there because the bank is physically sending out envelopes to their customers, telling them what happened.
And maybe the cost of one envelope isn't much, right? 50 cents to ship the envelope, but a dollar to print it, even if it's two bucks or whatever per, per envelope, if someone can use a bot to carry this out at scale, if they can actually do this for a hundred thousand, 200,000 stolen accounts, suddenly that bank is paying half a million dollars, right?
It, it really starts to add up here.
And so the, the use cases are across the There's tons of different ways to make or lose money with this, but that's, that's just one example.
Yeah.
And it doesn't, I mean, that's probably not even just financial, right? So if you're a recipient of one of those letters, you, you might, even though it's probably a password that you've reused and that's why you got broken into, you still, you think this bank or, or this site has, has compromised my account in some way.
So you might lose some, some actual reputation there as well.
Right? Yeah, absolutely.
I mean, that's, that's a big, big part of it is, is just trust on the web.
And this fits into everything we do. We talk about it all the time on the blog is like Cloudflare wants to promote trust on the web.
We're trying to secure different sites and different businesses.
And when, when bots are kind of out there, just wreaking havoc on the world and taking chances with things like this, that does nothing for trust.
It detracts from it. So let's talk specifically about how Cloudflare identifies bots.
And I'll ask both of you, I guess, Ben to kind of start.
How do we identify bots? What do we do? There's a number of different approaches here and we'll try to simplify it.
Cause I think we can, we can really go into the weeds here and get detailed.
The first thing we do is we have something called a heuristics engine.
And the idea is obviously Cloudflare proxies a lot of traffic on the Internet, right?
We've got something like 25 million sites that are using Cloudflare at this point.
And we have been able to collect a database of bad fingerprints on the web at this point.
Anytime a bad bot attacks one of our customers' websites, we can notice that.
And then we can store the fingerprint or some identifier for that bot in one database.
So then we allow our customers to use that database.
And basically if we see a bad fingerprint or a bad bot, they can block that bot.
They can challenge. That's one of the ways that we identify those bad actors.
There's a couple other steps here and I'll give you one more and I'll point to Thomas for the third, cause I know he worked on it a lot.
The second one I wanted to talk about is machine learning, which is obviously a, it's a buzzword.
People talk about it all the time, but we really do it here. Again, there's so many sites behind Cloudflare that we're actually in a good spot to just feed a lot of data into a machine learning engine, identify a few key attributes or things we want to sort on in terms of deciding between bot or human for every request.
And then that big machine learning engine spits out an answer. And so that's, that's often a score, which tells us with some degree of certainty, whether or not a request is coming from a bot.
The third thing is called anomaly detection.
And that's, it's kind of Thomas's baby. So I will, I will send that over to him.
All right, Thomas, we're going to put you on the spot and ask you to talk like a product manager here.
What does anomaly detection do? Right. So the interesting thing about anomaly detection and the way it differs from, from the other approaches mentioned here is that it's, that's unsupervised.
So what this means is we, we don't need to have seen certain examples of, of, of bot traffic to be able to flag something, right?
If you think about the, the, the machine learning model that Ben talked about, machinery model is, is, is driven by training data that, that requires labeled, labeled data.
That implicitly means you have to have seen something for it, for it to be, for it to be identified by such models.
And it's super effective at doing that particular task.
But obviously there's, we have this active adversary landscape where there's constantly, where there's attackers constantly adapting and constantly trying out new ways to circumvent certain detection models and detection systems.
And that's where something like unsupervised learning can play a big role into not, not being trained on specific, on specific examples of, of, of bots, but actually learning what normal looks like on a site and then looking for anomalies, looking for outliers, actually looking at, at, at how a bot or a, or a user is interacting with it, with a site and looking how that differs.
And if that differs greatly, we can say that this is suspicious and we can start flagging that.
So it's very complementary to, to the previous approaches and it plays an important part in, in, in our suite of, of detection, detection tools.
Right. And then Ben, so your system identifies a bot, what can our enterprise customers with the bot management do with that?
So what we do is we basically distill everything that we have found into what's called a bot score.
And so the bot score is one through 99. If the score is lower, it means we're pretty confident that a request is coming from a bot.
If the score is higher, it means we're pretty confident a request is coming from a human.
So Thomas and I have just kind of outlined a couple of different detection engines here.
Each one of those detection engines plays a role in helping us find that score.
So if our heuristics engine, for example, identifies a bot on the web, we just score the request one.
We just say, this is with high confidence. We know that this request is coming from the bot.
On the other hand, if machine learning, for example, we throw in a request, it looks at all the data and says, I really think this is a human.
It may spit out a score of 99. It may say with total certainty, I know that this request is coming from a human.
And look, anomaly detection plays into this.
We have a couple other things that are contributing to the score.
But the idea is we want to provide this score to enterprise bot management customers and let them do what they want.
The appropriate action is not always to just block a request outright.
Sometimes there are other things you can try.
We offer a couple of different challenges. I know everyone at home is familiar with solving a CAPTCHA.
That's actually just one type of challenge.
But we want to give you a couple of different options in the dashboard so that you can figure out what's right for your site and also kind of tune the amount of aggression you go after these bots with.
Great. All right. Let's go into a practical discussion.
So we run Cloudflare.com. Pretty popular site and lots of different use cases on Cloudflare.com.
Ben, what kind of bots do we see there?
We see a lot. A lot of people go after Cloudflare.com. But one really common case we've seen is actually with SEO spam.
So SEO is search engine optimization.
A lot of people have figured out that if you can create a lot of websites that point to each other, in other words, they cross link, you can sort of rise those websites up through the search engine listings.
And you can get yourself a lot of visibility.
And unfortunately, Cloudflare can kind of be abused to make this happen.
So people have signed up a bunch of TLDs, top-level domains, with Cloudflare and then tried to link them all and spammed search engines.
And we saw this happening for a while, right?
We could log in, we could see who was creating accounts, bringing their domains to Cloudflare.
And we wanted to play a role in stopping it, right?
We don't want to be a vector or in any way contribute to this.
And so first, we tried just deploying rate limiting to make sure that no one was signing up a bunch of domains at once for Cloudflare and then trying to carry this out, carry out any form of SEO spam.
And rate limiting helps, right?
Like you're able to look at certain bad actors on the web who are just saying, sign this up, sign this up, sign this up.
And at a certain point, they hit the threshold and we say, no, that's enough.
But attackers get very, very persistent. And this is the biggest problem we see with bot management is people will do whatever to get past the security measures we have in place.
And so we- Also, why is that?
Why are they so persistent? Part of it is the financial benefit, right? Like we talked about this before, in that if you can carry this out, if you can successfully get around security tools, there is a lot to benefit from here.
And look, the Internet is growing, which means that the possibilities here are growing as well.
So there's always some new way you can benefit from this or some new way you can kind of tune it to meet your own use case.
So that's one of the reasons people are trying to get around this.
I also think it's become a game for people. The idea that you can create a bot and sort of overcome some sort of security mechanism, there's a certain feeling of victory when you successfully do that.
And so that definitely plays into it as well.
But what we've seen in the SEO case is, again, we tried rate limiting.
It worked. It definitely helped block some of the problem.
But we then started using our bot score and bot management to actually block traffic based off of a lot of the actual actions that were going on.
And that's a little more dynamic.
It helps us get at the problem a little better and has pretty much wiped out the SEO spam problem.
Great. I also remember talking to some of the earlier bot engineers about the first use case on Cloudflare.com, which is the marketing forms.
Can you talk about that a little bit? Yeah. So there are marketing forms on our site.
Obviously, folks sign up for webinars. They sign up for all different types of events.
Anything that requires a form, anything where you go to Cloudflare.com and you can enter your name, your company name, what you're interested in, whatever, that form data gets submitted to us at Cloudflare.
And you can imagine bots are kind of a perfect tool to use to overcome a form, to not just submit it once, but to submit it 10 ,000 times.
Just over and over again and flood a company like Cloudflare with lots of information.
And you're right. This is one of the first bot problems we saw where bots would show up, stuff our marketing forms over and over again.
And we ended up with all of this nonsense information. You open up the internal spreadsheet that has all of this data, and it's just nonsense.
People are just doing it either out of vengeance or any other reason.
There's a lot of different motivations here.
And so, again, we deployed bot management. We sat it in front of the site and said, if you think that something is a bot, at least challenge it, if not, block it.
And that, again, appears to have really helped, if not completely wiped out the issue.
Great. Let's move on into the things that you guys both announced today.
And I want to start first with API abuse detection. So we've been talking a lot about bots on the web imitating users.
Can you explain the API problem?
Is this the same as bot management? Is it different? What's the difference here?
It's both. And this is really confusing. When we talk about bot management, so Cloudflare has had a bot management product for about two years now.
And it was originally designed to help us find bad actors or bad bots among humans, which is one problem in itself.
And we've done pretty well in getting after that problem.
But API traffic is, by nature, automated. So if you use bot management to protect API endpoints, to look at API traffic in any way, bot management is going to say, OK, assuming this is human traffic, how can I find the bad bots?
But API traffic is not human traffic. The challenge here is, how can we find bad bots among other bots?
And so we kind of reframe the problem entirely. What this all means in terms of bot management is, in some cases, bot management does work really well on API traffic, especially when APIs are designed to be used by browsers, which may mimic human traffic a little bit more.
But in a lot of cases, the API problem, as we've kind of called it here, is totally separate because we have to design something that detects intent, that detects the actual intent of the bots themselves.
And that's why it's a separate problem. And this sounds somewhat connected to a mobile app, right?
So a mobile app usually calls an SDK, and it's not doing it through a browser.
Is that right? Yeah, that's right.
And this one is confusing, right? If I have my phone here, and my phone blends in with the Zoom background.
If I have my phone, and I'm using an app or something like that, just to get food delivery, I open up that app, and to me, it looks like I'm making a human request.
Like I'm tapping a button, and the food is being ordered and delivered to my house, or I'm tapping a button, and my email is coming up.
That's all human. But on the back end, these mobile apps, they consume API influence, right?
And so what's happening is the human action is actually being translated into what appears to be an automated action.
And so this is why mobile apps tie into this API problem, because really, again, we're not looking for bad bots among humans.
We're kind of looking for bad bots among other bots, like phones.
And so that's why it's all tied into the same thing here. If you get at the API problem, you get at the mobile problem, too.
Great. So we talked about three different techniques on the API side there.
The first was API discovery. What's that?
So fortunately, it's kind of exactly what it sounds like, which is we want to discover as many API endpoints as we possibly can for customers.
And the reason this is important is, I'll give you an example.
Here's an API endpoint, just your website slash login, right?
And this might be the path you are sending all of your customers, your visitors to when they log into your website.
And that's a pretty obvious path, right?
Most people know that slash login is where you send folks, and it seems like a pretty common use case.
But as companies grow, they don't just have one API.
They don't just have the slash login, right? They have a lot of different API endpoints that start to stack up.
You've got log out, you've got update whatever, auth, authorize this.
There's going to be tons of these, not just five or 10, but in some cases, hundreds or thousands.
And you will start to lose count of these, right?
We've heard from plenty of larger companies that have actually started to lose track of all their API endpoints.
And so API discovery is this process that we've sort of developed here where we will go out and help you discover all the API endpoints you have.
Some of them you will already know about, right?
Some of them you're using on a daily basis, but some of them may just have been lost at like a 5,000, 10,000 person company, right?
So this is a product that will help you discover all those API endpoints and list them out so you know what they are.
And what kind of, if I'm watching this and wondering if I should be using API discovery, what kind of companies need that type of tool?
I think it depends.
Generally, if you're watching this and you've already heard about some form of the API problem, right?
You know, you use APIs for your web app, or you just have a login page or something else like that, that's expecting automated traffic.
There's a chance that this affects you, right? Generally speaking, if you don't see any problems, right?
Like if your site is totally fine and you're operating at kind of a small scale, you're a couple of people just running a website, you're probably fine.
But if you work at a larger company where you know that you're, you know, maybe you rely on a mobile app, again, like maybe your entire business is built around a mobile app and there are hundreds, if not thousands of people working there, you almost definitely could benefit from something like API discovery because there's just so much floating out there.
Great. Thanks, man. So Thomas, I'm gonna ask you a question I've probably asked for you dozens of times over the last few years, which is, I've read your blog.
Can you explain a part of it to me? And let's start with path normalization.
That was a little bit technical in the blog.
And can you just break that down for us? Sure. So when you think about the API problem, you think about API requests and typically API requests are pretty structured in terms of the HTTP request path that they're being accessed.
There's typically some structure that reflects the underlying business logic of that API endpoint.
But what you typically see is that there are some variables encoded in the path, which actually make it that there's an identifier or some variable.
And this makes the specific paths not exactly reflect the business logic that is underlying.
So you have some variable in there, you have some identifier in there. Think about slash user and then the user ID that does not reflect the underlying action that's being taken for the API.
So what we try to do with path normalization is if we have discovered API traffic, we need to actually reveal the underlying action that is being taken and that groups all these API requests together.
And for that, we need to do this path normalization.
So what it means is essentially try to collapse or ignore the varying part of that API call.
So the example, you would wildcard out the user ID and you actually care about the underlying action that's being taken.
So to sum up, it's bringing out the business logic that is driving this API call and ignoring the variable parts of an API call.
And how effective is that? So can you share any stats from your early access?
Right. So yeah, if you look at raw API traffic, typically there will be hundreds, thousands of unique request paths just because of those variables being so present in that.
And in several use cases, we've seen that we can bring that number down from hundreds of thousands to a couple of dozens of API calls that just express that single business logic that we actually care about when building models around this.
Good. Great. All right. Let's get into the harder questions.
So you've discovered API endpoints and now how do we do what Ben said?
We're now looking for bad bots amongst good bots. Right. So this ties into a conceptually similar to anomaly detection.
Essentially, we will try to learn what typical API calls look like on your site.
And there are several ways to do this.
We will create a baseline. And one thing we do is look at volumetric anomalies, for instance.
So each endpoint is typically accessed a number of times during a session.
And if you think about a recent password endpoint, for instance, that's typically a very sensitive endpoint for a customer.
And we don't expect that endpoint to be called many times during a session.
If that happens, that's typically a sign of a credential stuffing attack, for instance, or someone at least trying to compromise an account.
So what we do is once we've automatically discovered all these API endpoints, we can start to learn what is the typical behavior on those endpoints, right?
And then we can learn, okay, if you're interacting with that endpoint in this way, that is probably anomalous.
And that's probably a sign of malicious intent.
And that's how we can surface that abuse. Does a volumetric anomaly have to be malicious intent or is it potentially a signal?
It's an anomaly. So what that means, it deviates from the normal, right?
It does not necessarily indicate malicious intent.
It can also be a bug in a client or an accidental script, an API client script that's not programmed very well and that is interacting with your site in a sort of strange way.
So it's not necessarily malicious, but it's definitely anomalous.
And there's definitely great value in servicing that anyway.
So customers can understand what kind of weird stuff is happening and what kind of abuse intentional or unintentional is happening on their API endpoints.
Great.
Could be another way of identifying, like you said, a bug, right? Which would be a useful bit of information for a developer.
Absolutely. So outside of volumetric, how else are you identifying abuse?
So the other thing, the other perspective we're looking at this is the sequence of API calls that are being made.
So if you think about interacting again with an API system, there will be typically some kind of ordering or sequence you walk through, a flow you walk through when you do certain actions, right?
You buy something, you first put something in a cart, then you go to checkout, and then you go to pay.
There's a definite order.
So what we will do is we will take all those API endpoints at the possible API calls coming out of the path normalization step.
And we will again put a, train a baseline, learn a baseline of what typical sequences look like, what the normal flow to your API looks like.
And then we can start comparing how, in the live setting, we can start comparing how current flows are actually acting on your, or interacting with your site.
And if they're doing that in a very strange way, for instance, they're doing a flow that has never been observed before, we can again flag that because that could be, again, a sign of malicious intent or other anomalies.
So this sounds heavy, like a lot of compute power. How do you do this efficiently?
You're right. It's a technical, it's a challenging problem. So the way we've implemented this is using Markov chains, which is essentially, we are setting up a transition matrix between all the different states in your API system.
So if you think about the path normalization step, so we then have a list of all possible API calls or actions you can take on your site.
And if you think, if you then make a transition matrix, it's a matrix that counts the number of probability, the number of times you transition from one state to another state, from one API call to the other API call.
So we can simply set up this transition matrix and infer the probabilities of doing that actual transition.
As you pointed out correctly, that transition matrix can explode tremendously.
If you have a very big API system with many endpoints, that will actually grow exponentially with the amount of paths you have.
But fortunately, when we were looking at our use cases, we see that actually a lot of flows are not observed in a site.
And if you think about it, that's kind of logical, right? It's that most users will, or clients of the API will follow similar patterns when they interact with your API.
So it turns out we can prune that transition matrix down to only the transitions that are relevant for that.
And it turns out that we come up with very manageable and sizable transition matrices that we can generate for our customers and then use to do the anomaly detection in a live setting.
Great.
So I'm going to go back to you, Ben. We've heard, and I think we provide some services around a positive security model.
Why would we want to do this versus that, or maybe both?
Yeah. Well, look, I think the positive security model idea, this idea of identifying just the traffic that you want to let through and letting it be that way, just actually letting that traffic through, is generally a really good one.
And it is really effective. The problem is not everyone can identify the specific traffic they want to let through.
I mean, imagine if we tried to do this at Cloudflare.com, right?
If we wanted to list out the exact people we wanted to let through to our site, and then we just said, everyone else who shows up, forget about it.
We're just going to block you or challenge you or whatever. That wouldn't work.
And so a lot of companies are facing the same problem where they said, look, I could make a list of everyone I want to give access to this website, but that is just not sustainable and it's not realistic.
And so the reason that we have to come up with this idea of API abuse detection is folks need to do the other thing.
They need to start by assuming I'm going to let everyone through unless they violate one of these rules or look suspicious.
So that's kind of the reason a positive security model doesn't work here.
It's not to say that it isn't a great solution in some cases, but our approach here is let's start by assuming you've got an API endpoint, you've got partners who might not be able to keep their user agents consistent and that kind of stuff.
And then let's go from there. Great. So let's get to the real questions.
How do people get access to this? You announced early access this week.
How does that work? So yeah, we announced an early access period.
What's happening here is Thomas, as well as the rest of our team, are working on building this out as quickly as possible.
And part of doing that is actually training the data, right?
We could sign everyone up today and we could say, we're just going to run this for all you guys, and hopefully it'll work really well.
But the reason we're doing this as an early access is we want to spend at least the next couple of months training on your data and making sure we get this really active.
So today, early access, if you already have bot management or you're a Cloudflare customer, reach out to your account team.
So that's your customer success manager, your solutions engineer, anyone who's your point of contact, and they will loop you into our team.
We'll make sure you can get access to this early.
You'll be able to eventually test it out. Hopefully we can train on what's going on on your site to make sure it's accurate.
This is kind of a preview to what's coming and what we plan to launch throughout the rest of the year.
Great.
All right. Let's move to the second thing that you announced today. It's kind of probably pretty fun to announce two launches on the same day, Ben.
Super Bot Fight Mode, which I had to practice to say appropriately.
Can we first talk about Bot Fight Mode?
I know we've had that for a long time and what the differences are. Yeah.
So Bot Fight Mode, like you said, has been around for a while, right? Anyone at home can search our blog post.
I think it's called Cleaning Up the Climate or Cleaning Up with Bots.
And we announced this originally just to give our free customers something to use against bots.
We talked about enterprise bot management works really well, but not everyone needs the thing with all the bells and whistles.
Some people just want to log in and flip a switch.
So Bot Fight Mode is that switch. You turn it on, we'll sit in front of your website, and we will challenge any bots that show up.
One of the things that made Bot Fight Mode kind of unique is we didn't just challenge them, right?
We didn't just show them a CAPTCHA or just show them a JS challenge.
We issued what we like to call a computationally expensive challenge. So we're trying to get back at these bots.
Actually, when they show up, we will unload something that causes them a lot of pain on their server, whether it's financial pain, or we just keep them away from your website.
It is a challenge that is computationally very, very expensive and not something bot operators want to see.
So that was the idea behind Bot Fight Mode. Simple toggle, we challenge bots for you.
Today, we are announcing and releasing Super Bot Fight Mode, which as you said, is almost a tongue twister in ways.
But Super Bot Fight Mode is a supercharged version of that original Bot Fight Mode.
So if you have a pro plan, if you have a business plan, we're taking all the great stuff from Bot Fight Mode, and the great stuff is, I guess, mitigating bots, but also giving you some more tools.
So one of those things is analytics.
You can log into the Dash immediately right now and see new insights in terms of what your bots are and what that bot traffic looks like.
We're also giving you some more controls. So look, we get it. Everyone has kind of a different use case.
The enterprise customers want to be able to have all the bells and whistles to fine tune things.
We want to give you a few more levers to pull, at least if you're at a pro or a business plan.
And so those levers look as simple as, instead of just challenging, maybe you want to block all traffic.
Or maybe you want to exempt verified bots from what's going on. And verified bots, again, are those good bots on the web.
So that's a taste of what we're launching today.
And so you have offerings at both of our pay-as-you-go plan levels, our pro and our business plan.
And they're not add-ons. You just get these as a part of those plans.
Is that right? Yeah, it's unbelievable. You can just log into the dashboard.
You get these right away. If you have a pro or a business plan, you get this immediately and can start to use it.
It's pretty great.
And what are the differences? What do I get at the pro plan and what do I get at the business plan?
So there's something different in each one. Actually, the pro plan has its own set of analytics, which includes a bot report, which is a really kind of high-level view of your bot traffic.
So we'll break things down. We'll show you likely automated traffic, likely human traffic, as well as verified bots.
Actually, I love the bot report so much that because it just exists at pro, sometimes I'll downgrade my business account to pro just to take a look and see what's there.
And then at the business tier, we're giving basically a version of bot analytics.
So you can log into your account. You see more than just the high -level data.
You'll be able to sort through the IP addresses, the ASNs, the countries that are associated with your bot traffic.
So you get all of that at business.
And that's a breakdown of the analytics piece. So again, bot report at pro, bot analytics at business.
As far as the actual configurations and levers you can pull, they're very similar.
So our pro customers can go after what we call definite bots, the bots that are, we're really confident, right?
Or rather the requests that we're really confident are coming from bots.
Our business customers can go after that same group of definite bots with a block or a challenge, but we're also letting them go after likely bots.
And so these are the bots on the web that tend to be a little more sophisticated and we allow them to split those up.
So you don't have to take the same action that you take for definite bots with your likely bots.
You could decide, you know what, I want to block all traffic that Cloudflare is really, really confident in coming from a bot, but I just want to challenge the likely bots.
And so that allows you to kind of have a more granular approach when you're configuring your site.
Okay. And what's the difference between a block and a challenge?
So it actually, it's important to know this difference because the blog, the actual block that we use completely shuts out a bot.
So if you show up and you're a bot at someone's website, we'll just block you. And there's really nothing you can do.
And a lot of folks want to use that. It's a great way to just mitigate and end the problem right there.
But the reason it's so important, we call it the challenge as a separate option altogether is the challenge will allow someone to get through if they can prove that they are a human.
And sometimes this happens, right? Bot management is educated guesswork at its core.
It is not a bot showing up and saying, this is me, throw me in this pile or throw me in the human pile.
It's a bot often trying to convince us that it is a human.
And so in rare cases, when we make a mistake in classifying someone, or just when you're in kind of a gray area, you want to provide folks who are showing up at your site with the chance to prove that they're human, even if all of their other actions say otherwise, right?
And so issuing a challenge will allow us to sort of offer them an olive branch and say, look, if you really can show us that you're a human, we'll let you through anyway.
All right. And now the question that every website owner is asking right now on the pro and biz plans, how fast is it?
Is it going to slow anything down? It's real quick. It is really, really, really fast.
Just to give you an idea of this, one of the reasons we are able to give you analytics on your site is because we can run these detections behind the scenes so quickly.
So if you log into your pro account again, and you go to the bot report where we're showing you this portion of your traffic is likely automated.
This portion is likely coming from humans. The reason that's there is because we're already running the detections.
It's so fast. You probably haven't noticed it up until today, right?
There's almost no latency there. And so when you turn on the configurations, you're actually not adding really any additional latency there.
We've been doing this for a while, right? We've been training the data for years now without adding any additional latency there.
So it's so quick that actually enabling it today should not change anything you've had for months, if not years.
And can you give me some use cases? What might I be worried about on my site that would make me want to turn these features on?
Yeah. There's one use case in particular that kind of stands out to me because I think it may be pretty unique to customers who are on pro and business plans.
And that is the ads case.
A lot of our pro and business customers run their own blogs, or they run their own smaller sites where they are supported by literally showing ads.
Oftentimes it's Google AdSense or Google AdWords on their website, where they are just showing little banner ads on the side of their site.
And as a result, they make some money from people showing up and clicking on those ads.
The problem is they don't get paid if Google or any other ad operator detects that a bot clicked on the ad instead, right?
All sorts of fraud detection mechanisms in place that are calculating whether or not an actual human clicked on this or whether or not a bot showed up and was just clicking, clicking, clicking over and over again to drive up revenues there.
And so customers like that who have sites with ads will need something to actually block those bots so that that way the revenue they're getting and their ad provider will be trustworthy, right?
And they won't have to deal with any issues with their ads getting shut down or not getting the proper income that they deserve.
If you deploy super bot fight mode, again, very difficult to say, super bot fight mode on your site, you'll be able to block a lot of those bots that were showing up and committing ad fraud in the first place.
So that's a great use case for it, although there are many, many others.
And then if I'm an operator on a customer, how do I know whether I should turn these features on?
What should I do? What do I see? And how do I know what to do? Yeah, again, I think this is where the analytics come into play, right?
Analytics are fun at first.
It's great to look in there and be like, hey, I don't have a bot problem or I do have a bot problem.
There's actually either case is kind of exciting. And you can log into the dashboard and immediately see what your bot problem is and understand whether or not you should deploy these extra levers.
So if I'm a business customer, right, and I log in and I use bot analytics, I can actually tell what IPs are hitting my site most commonly, what user agents are showing up all the time.
This can often be an indicator that something's going on, right?
If I see that the top ASN reaching my site is a cloud ASN, well, there usually aren't humans behind cloud ASNs.
That could be an automated service that's showing up and constantly causing problems.
And so you're in a really cool position now where, again, this is included with the plans.
It's not like you have to go pay someone to try this.
You can turn on mitigation for a little while, see if it fixes a problem or see if those numbers change, and then roll from there.
And walk me through that decision a bit.
Like you just kind of mentioned that you don't have to pay for this.
What was the rationale to give this to our paying customers?
You shouldn't have to pay. None of our customers asked for bots in the first place, right?
And so we don't want to penalize anyone who has not done anything to deserve these problems in the first place.
And it's our job as a company that has really tried to democratize the Internet and take a lot of the things that were previously just available to larger companies and bring them down to everyone else.
It's our job to sit there and make sure that happens. So again, it's not a difficult thing for us to offer.
I mean, it's not even a big question for us because we are able to just take all these features that have already existed and worked for our larger enterprise customers and give them to you.
We've been scoring the traffic for a while.
There's no reason folks at home shouldn't be able to use those insights.
I want to go into, if I'm looking at the configuration page as a business customer, there's something there called JavaScript detections, which I'm not sure if we talked a lot about in the blog post.
Can you explain that feature a bit?
Yeah, there's a couple of these features that snuck in that we didn't get to mention in the blog post.
So look, JavaScript detections, we've actually been offering this to our enterprise customers for a while now.
And the basic idea is bot management or super bot fight mode or regular bot fight mode.
They all work really well on their own and they can find a lot of bots for you on the web.
But JavaScript detections, it's like adding a booster pack to the bot detections you already get.
So when someone reaches your site, we will inject just a small amount of JavaScript, just the other very broad information about who's reaching the site.
So we're just talking about general device class.
We are not going after user fingerprinting or anything like that.
And by using JavaScript here, we are able to identify often headless browsers as well as other malicious fingerprints that are on the web.
And so we separate this out from, again, all of the other levers you have.
So you don't have to use this if you don't want, but it is really, really effective.
So if you've got super bot fight mode today and you want to turn this on, just head to the configuration panel and flip that switch to on.
We will immediately begin using JavaScript to find some of these headless browsers.
So it's really, really effective.
And again, just helps us find more bots. And then who should and should not turn on this feature?
I generally say by default, you should probably have it on.
The only reason you wouldn't want to have it on is if for some reason you have concerns about extra JavaScript running on your site.
And in most cases, we've actually spent a lot of time working with the team to make sure that our JavaScript does not interfere with other things that are running.
So it's CSP compliant.
We make sure that when we're actually injecting or doing anything, it honors all of our really strict privacy standards.
This works really well.
But if you are particularly sensitive to having extra scripts running on your site, you can keep it off and everything else will still work.
Great. All right.
So we've talked about pro and business. What's going on with our free sites?
So it's interesting because the super bot fight mode blog, the one that just went up and that you all should read is very focused on pro and business.
But the truth is we have rolled out something else for free customers as well, which is what we were just talking about is JavaScript.
We previously did not have JavaScript detections at the free level.
We were just going after really those heuristics, largely on cloud ASNs, because we wanted to start off light and kind of wanted to give customers a chance to wade into this and make sure that we weren't doing anything too aggressive.
But beginning today, and we've been testing this out before, we are going to include JavaScript detections with bot fight mode.
So you may start to see it be a little bit more effective in going after those headless browsers, people who are using things like Puppeteer or Selenium, really anything that is reaching your site and trying to deceive it in a way.
One thing I should note is if you already opted into bot fight mode, then all you have to do is turn the switch off and then turn it back on.
We want to be really careful to not start injecting JavaScript or anything like that without your permission.
But for everyone who turns it on today, it's already going.
And so you can flip that on.
We will start using JavaScript and we can catch a couple more bots. Great. And then I guess last on super bot fight mode, how does this help our enterprise customers?
So it actually helps them a lot. And there's a few reasons for this. Number one is just, look, more people is always better.
We are trying to build, we said this in the blog post, we're trying to build a united front against bots.
And bots have literally come close to 50% of the Internet.
The more people we can build up with bot defenses, the more we can disincentivize the use of bots on the Internet.
And so the more people that join this fight will ultimately help our enterprise customers.
But again, I think more importantly, enterprise customers will benefit from what we learn here, as will our pro and business customers.
The more bots we can detect and the more bots we can see, the more we can block at the enterprise level.
And so it's in everyone's best interest that we add these additional customers because pro and business customers have really diverse sites.
These are totally different in a lot of cases than some of the enterprises that are showing up and putting bot management in front of their sites.
They're smaller.
And so they have all of these different tools that can jump in and really help us out.
Great. We are nearing our time. We have a few more minutes. So I want to ask Ben, outside of everything you've talked about today, what was your favorite thing that you've shipped on the bot management team at Cloudflare?
Well, this is a tough one.
Fortunately, I've only been involved in one major product launch before this.
So it makes it easier if I'm going to choose a big one, which is bot analytics.
We launched bot analytics about five months ago, so in October of 2020.
And it is, again, exactly what it sounds like. It's analytics for bot management.
So in our dashboard, we started showing customers visual tools, things that they can dive into to actually put almost like a face to the problem that was going on on their websites for a long time.
And we were actually doing behind the scenes.
And this is useful for tuning rules, for going in and actually making sure your detections are accurate.
But it's also fun.
I've talked to customers who it's their entire job to handle security for some enterprise, but they get a little bit of joy out of logging in every morning and seeing that we blocked a certain percentage of their traffic because it was totally malicious.
And you get really emotionally invested in it too. So this is a great way to kind of reward yourself and see that you're doing the right thing.
Great. So thank you both for taking the time to talk to me and the audience today about the exciting announcements you've made at Security Week.
Thank you for shipping those announcements. I think they're great enhancements to our product line and more tools for our customers to control the content on their sites or the access to their sites is always great.
With that, we will end the session and look forward to more things from Security Week.
Thanks, guys.
Cool. Thanks, guys. Thanks for having us. It's dead clear that no one is innovating in this space as fast as Cloudflare is.
Cloudflare has been an amazing partner in the privacy front.
They've been willing to be extremely transparent about the data that they are collecting and why they're using it, and they've also been willing to throw those logs away.
I think one of our favorite features of Cloudflare has been the worker technology.
Our origins can go down and things will continue to operate perfectly.
I think having that kind of a safety net, you know, provided by Cloudflare goes a long ways.
We were able to leverage Cloudflare to save about $250 ,000 within about a day.
The cost savings across the board is measurable, it's dramatic, and it's something that actually dwarfs the yearly cost of our service with Cloudflare.
It's really amazing to partner with a vendor who's not just providing a great enterprise service, but also helping to move forward the security on the Internet.
One of the things we didn't expect to happen is that the majority of traffic coming into our infrastructure would get faster response times, which is incredible.
Like, Zendesk just got 50% faster for all of these customers around the world because we migrated to Cloudflare.
We chose Cloudflare over other existing technology vendors so we could provide a single standard for our global footprint, ensuring world-class capabilities in bot management and web application firewall to protect our large public-facing digital presence.
We ended up building our own fleet of HAProxy servers, such that we could easily lose one and then it wouldn't have a massive effect.
But it was very hard to manage because we kept adding more and more machines as we grew.
With Cloudflare, we were able to just scrap all of that because Cloudflare now sits in front and does all the work for us.
Cloudflare helped us to improve the customer satisfaction.
It removed the friction with our customer engagement.
It's very low maintenance and very cost effective and very easy to deploy and it improves the customer experiences big time.
Cloudflare is amazing.
Cloudflare is such a relief. Cloudflare is very easy to use. It's fast. Cloudflare really plays the first level of defense for us.
Cloudflare has given us peace of mind.
They've got our backs. Cloudflare has been fantastic. I would definitely recommend Cloudflare.
Cloudflare is providing an incredible service to the world right now.
Cloudflare has helped save lives through Project Fairshot.
We will forever be grateful for your participation in getting the vaccine to those who need it most in an elegant, efficient and ethical manner.
Thank you. Thank you.
Thank you.
Thank you.
Hi, we're Cloudflare.
We're building one of the world's largest global cloud networks to help make the Internet more secure, faster and more reliable.
Meet our customer, Wongnai, an online food and lifestyle platform with over 13 million active users in Thailand.
Wongnai is a lifestyle platform. So we do food reviews, cooking recipes, travel reviews and we do food delivery with Lineman and we do POS software that we launched last year.
Wongnai uses the Cloudflare content delivery network to boost the performance and reliability of its website and mobile app.
The company understands that speed and availability are important drivers of its good reputation and ongoing growth.
Wongnai also uses Cloudflare to boost their platform security.
Cloudflare has blocked several significant DDoS attacks against the platform and allows Wongnai to easily extend protection across multiple sites and applications.
We also use web application firewalls for some of our websites that allow us to run open source CMS like WordPress and Drupal in a secure fashion.
If you want to make your website available everywhere in the world and you want it to load very fast and you want it to be secure, you can use Cloudflare.
With customers like Wongnai and over 25 million other Internet properties that trust Cloudflare with their performance and security, we're making the Internet fast, secure and reliable for everyone.
Cloudflare, helping build a better Internet.