🔒 Security innovation to fight fraud and better manage APIs
Presented by: Adam Martinetti, John Cosgrove, Saikrishna Chavali
Originally aired on January 10 @ 8:00 AM - 8:30 AM EST
Welcome to Cloudflare Security Week 2023!
During this year's Security Week, we'll make Zero Trust even more accessible and enterprise-ready, better protect brands from phishing and fraud, streamline security management, deliver dynamic machine learning protections and more.
In this episode, tune in for a conversation with Cloudflare's Adam Martinetti, John Cosgrove, and Saikrishna Chavali.
Tune in all week for more news, announcements, and thought-provoking discussions!
Read the blog posts:
- Announcing Cloudflare Fraud Detection
- Automatically discovering API endpoints and generating schemas using machine learning
For more, don't miss the Cloudflare Security Week Hub
English
Security Week
Transcript (Beta)
Hello, folks. Welcome to the session on Security Innovations to Fight Fraud and Better Manage APIs.
With me, we have two of our product managers: Adam Martinetti and John Cosgrove.
I'm Sai Cahvali,part of our product marketing team. But a quick introduction from both Adam and John, if you would introduce yourselves.
Hi Sai, thanks so much for having having me.
My name is Adam Martinetti. I'm the product manager for our Bot and Fraud Detection offerings here at Cloudflare.
I've been at Cloudflare for over five years now and thrilled for my first time on Cloudflare TV.
Hey, Sai.
Hey, everybody. My name is John Cosgrove. I'm the product manager for Cloudflare's API Gateway, where you can secure and manage APIs.
Great.
So I'm very happy to have both of them on to discuss our innovations around fraud and API security.
Now, both of these are very hot topics for modern online organizations, whether your organization is selling products to consumers, software to enterprises or, you know, is an NGO providing support to the needy around the world.
Attackers are finding new ways to harm your online presence through malicious bots and breach defenses through your own APIs.
By the way, this half an hour of our discussion, we will be continuing the theme of machine learning to improve your security programs.
That's been a theme, it seems like, today of Security week.
Now AI and ML have been...
is kind of part of the hype cycle, is going through a hype cycle among the general public.
But today's conversation will focus on how ML, machine learning is being used across Cloudflare.
Machine learning are tools using algorithms and statistical models to analyze and draw inferences from patterns in data.
That's important to state because there's a lot of, you know, information around it these days.
So, usage of machine learning at Cloudflare has a long history.
It's been used across products to reduce time to detection and response, increase team efficiency, security team and IT team's efficiency.
In today's world, that means greater return on investment from your own Cloudflare solutions and products.
Let's talk about two more new ways that Cloudflare is using machine learning models to improve your application security in your programs.
We'll start with Adam, who will talk about fraud.
And let's, kind of before we get into the the product itself, Adam, can you tell us what are the kind of big hairy problems that we're actually tackling here?
Yeah, thanks so much Sai for the intro.
So the big hairy problems that we're tackling here with online fraud isn't just looking for automated actors and bots that are looking to exploit your site in new ways.
The more and more we look into the problem of online fraud, we're seeing that this isn't just bots, but large groups of people sometimes sitting, sitting in rooms of hundreds of people, all targeting the same website, using things like large amounts of legitimate real devices in order to trick existing systems, using large pools of resources like residential IP addresses that are valid in order to ensure that they appear as close to as possible as like a legitimate user as they can.
Now, fraud has become an increasing problem in the post- COVID world.
Consumers lost 8.8 billion to fraud in 2022, which was a 400% increase from 2019.
In one of the...what some of the research suggests is that as people spend more and more of their lives remote, they become more vulnerable to exploit or fraud as their kind of isolation makes them easier to target.
So in bringing this back to machine learning, one of the things that we did to enable these new online fraud models was we rewrote the entire module that runs lightweight machine learning models at our edge to allow for us to run inference incredibly fast at Cloudflare.
So if we're running, if we're running a machine learning model to determine if something looks fraudulent or not, if it looks like a bot or not, we're able to run these lightweight models in under one tenth of a millisecond, which gives us plenty of room to run multiple models in parallel in order to answer complex questions about specific kinds of fraud that you might experience.
I love it.
So you mentioned a couple of things there that I want to pull on. The first one is the fact that consumers have lost so much money in fraud and it's such a large increase.
Can you tell us, is this just an e-commerce problem?
Is it just when I'm going to try and find the latest, newest shoe and I'm beaten to it by a bot, or is it something more than that?
Yeah.
So that's a totally reasonable question. I think a lot of people assume that it is just an e-commerce problem, but more and more what we're finding is that if you have a signup page, if you have a login portal, if at any point in the process you expect to take credit card information from someone, even if you have a content submission form, there is going to be an impact to you.
There are going to be people who find a way to exploit those resources on your site no matter what.
And so if you have a site that doesn't necessarily directly get money from your customers, there still are financial impacts, resources that can be abused.
If you're seeing large spikes in fake accounts being created all at once, you're going to see resource use, usage increases.
If you're running cloud compute instances, it means that you'll need more.
It means that you're spending more on bandwidth.
It likely means that you have an engineer either an SRE or someone on your trust and safety team who's spending a lot of their day instead of doing their normal job, just trying to play whack a mole with some of these bad actors.
So one example that I like to give here is in the online gaming industry.
There was one particular customer that I spoke with where they provided some in-game rewards for the top ten customers on their leaderboard for a given day.
And because these resources or these rewards weren't monetary in value, they didn't expect to be exploited in the same way as something like a bank would be.
But what they found was that they had these bad actors that were creating thousands and thousands of accounts to ensure that one of their accounts, no matter which one it was, would be the one at the top of the leaderboard every week and get those prizes.
I suppose it was just because they were competitive. And what this meant was that they were spending a ton more on their own internal resources to serve the content for customers.
They were spending more really just to serve it to bad actors.
And the important thing here is that it wasn't just cost for them, but it meant a worse user experience for the legitimate users in their game because none of the real users who were trying hard and enjoying the game were ever rewarded for their performance, because there was always a bad actor who was on that leaderboard ahead of them.
That's a great example.
Thank you for making it so concrete.
You touched upon one, that's one of the multiple problems around fraud.
Can you kind of break down what kind of categories of fraud are most popular?
Most...that's in the industry out there.
And can you maybe just give us, again, examples of some of them to make it more concrete?
Yeah, I'd be happy to.
So the idea with what we're doing here is that Cloudflare wants to build bespoke detection models for every single type of fraud or application abuse that we can find online.
And so there there are plenty out there. I think that OWASP categorizes about about 15 to 20.
What we did is we started with four that we felt had, were going to have the most impact, that customers asked us for the most, and also that we'd seen recurring examples of in the feedback loop reports that we get from our bot management customers today.
So, so the example that I just gave with that gaming company was a good example of how fake account creation can harm customers even if they don't have monetary rewards.
I think account takeover is something that everyone knows about, where people are using stolen credentials to try and break into your site or not even stolen credentials, but they might be trying to do something like find users that have easy-to-guess password reset security questions, or they might have a weak two-factor authentication mechanism that they can exploit.
Obviously banks are the biggest example of this, but anyone with a saved payment method, someone could be trying to break into your site.
Um, the next example that we're targeting, or the next type of fraud that we're targeting is expediting.
And so what we mean by this is kind of skipping ahead, going as fast as possible through your site to try and gain access to limited resources.
And one of the best examples that we can find there are concert or sporting tickets, where there's a limited supply of these tickets.
And so what a bad actor will do is, you know, when everyone's in line, they're going to find a way to skip ahead and purchase those tickets faster than someone else could do that in order to get as many of those tickets as possible and sell them for a profit on the resale market.
And then the final thing that we're going to be focusing on for the debut of Cloudflare Fraud Detection is what we call carding.
And so carding is something like trying to find legitimate credit card numbers, either by guessing numbers from scratch or validating existing stolen credit card numbers that you may have found from elsewhere.
That's, that's very interesting, Adam.
So actually in many cases, when I hear these examples, you don't have to be an organization and enterprise to feel the pain as an individual.
You know, you might feel the pain with concert tickets already taken out before you can get them, etc.
But what I don't understand, why is Cloudflare, what does Cloudflare provide that will be unique in this context?
And that, you know, we've talked about a number of categories, but if you can just give us some examples that would make it more concrete.
Yeah, happy to do so.
So when I've been talking to customers, the kind of key refrain that I hear from customers over and over again is that Cloudflare has this massive scale.
According to W3 Tech, we see something close to about 20% of all web traffic on there.
And what customers really want to know is how the traffic that they see on their site compares to the rest of the Internet.
And so obviously, Cloudflare takes privacy very, very seriously.
And this is something that we want to do in a privacy-centric way.
But for each of these detection signals, what we want to expose to customers is: Is this individual that we're seeing, is what we're seeing them submit, is this something that looks in line with what we see on the rest of the Internet, or is this something that looks anomalous and something that could be a key outlier?
And so a couple of the ways that we're focusing on that to start with are our email accounts that people submit during the signup process and domain information.
And so we're looking at data that we already get, in concert with our with our Area One team, who are already producing a lot of really amazing intelligence today.
Fishing intelligence specifically is something that we're looking at and that we've found to be really helpful here.
In some tests that we've done with existing customers who have fake account creation problems, we've found that just the signal alone was able to detect about 70% of fake accounts, which, you know, that's a great starting point for us.
And the other thing that we're finding already has a lot of really good leverage here is the type of domain level intelligence that we're already providing to customers with our Cloudflare One products.
So the types of risks and risk categories that customers are seeing in their Zero Trust interface that they can write policies around today is the type of interface that we want to bring in and let customers in their Cloudflare firewall write the same types of policies to help detect and mitigate fraud.
Got it.
The key thing that we want to do here is that when we're giving customers all this data, we want to be very fine grained about the data that we're giving them, and we want to tell them exactly what the data source was and why we thought something was a risk.
Not just that we think it's a risk, because we know that a lot of our customers have their own in-house business logic around fraud, and we want to make sure that we're surfacing things that can be flexible to them and that they can use in ways that are in line with how they want to address that risk.
I love it.
Adam, This is a very exciting vision. Um, I want to make sure that everybody is aware there's an in-depth blog at our Security Week blog.
The feature for today that's on blog.cloudflare.com and on fraud detection.
I think it's very unique what we're providing as well as the speed at which we're doing this without sacrificing performance and reliability for our customers.
So much.
And yeah, if people go to that blog, they can sign up and be a part of the Early Access process where they can work with us and we would love to work with as many customers as possible here to make sure that the things that we're building are specifically useful to you and help you with your business use case.
Yeah, I want to stress, please do register your interest.
Thank you for bringing that up.
It's at the bottom of that blog. You can easily find it and register. Um, okay. So now let's move on to the second half.
Um, and John, I want to bring you in. Let's talk about better discovering customers' APIs and API schemas to protect APIs in the first place and how we're using machine learning in that as well.
To start off with, I guess, John, what are the again, big hairy problems we're tackling here?
Why is this so important? Yeah, good question.
Good place to start, Sai.
Really, to understand the importance of our new features that we've announced today, like API Discovery and Schema Learning, we need to first understand the importance of API security in general.
APIs offer this rich opportunity to share data, request services, and that's great.
But oftentimes the data that gets transferred by those APIs is rather sensitive.
And this can lead to data breaches. Many of the top breaches of 2022 were API related, and many of those breaches were really just open access APIs that had no authentication or authorization controls on them.
Companies really didn't even know these APIs existed.
And so the new machine learning-based API Discovery that we launched today gives customers the ability to continuously monitor for exposed API endpoints with no extra effort on their own side.
And like, what do I mean by extra effort?
Well, there's no input. There's no setup for this feature for API Gateway customers.
The machine learning aspect here allows us to already run inference across all of our customer traffic.
That means results are already waiting for you once you open the dashboard, as soon as you sign up.
And let me give you an example of how this impacted one of our customers in a really beneficial way.
It's a customer that had an internal audit department.
Different companies have these for different reasons, and that audit department routinely asked IT for different data sets.
Pretty standard stuff.
But recently the customer told us that their audit team came to them with a very short turnaround time, asking for all of their external API endpoints.
They really only had a day or so to satisfy this audit requirement.
Thankfully, not only had API Discovery already found these APIs, it was super easy to export them in a friendly format for the audit team.
So at the end of the day, API Gateway, API Discovery satisfied the audit requirements in a snap.
I love it.
Thanks for the customer example there. So let's rewind a bit. How do organizations usually discover their APIs and how do they know about what's the right format, what the APIs require in terms of a request, etc?
Yeah, I think it stands in stark contrast to our new machine learning-based Discovery.
Typically, teams are emailing around a spreadsheet saying, Hey, can you please fill this in, pretty please?
Um, there's no, there's no urgency usually that you can force on teams to do this even in the case of audit requirements sometimes and really it's kind of an ask and see approach.
Maybe you won't even get an answer if a team is busy with other projects.
So, and that's even if you know who to talk to in the first place.
Lots of folks just really don't know what they don't know.
I love it.
This is something that you see with a lot of security teams who are worried about their blind spots.
And it's a often talked about topic around shadow APIs, and there's a lot of information around it, but people just don't know what they don't know and they don't, many times APIs, there's a lot of APIs that fall in that.
So now there's a second part to this though, John. We talked about API Discovery, but then there's also schema learning.
Can you tell us more about that and what schema learning even is?
Right, right.
So we talked about schema learning in the blog today as well as API Discovery and with schema learning, we just mentioned about how customers sometimes can't find their APIs as fast as developers are putting them out, or maybe an older API that nobody really knows about.
Well, even when you do find the APIs, it's really difficult to accurately build open API schemas for each of the potential hundreds of endpoints for any given API.
And, given that security teams really only see maybe a method and a path in logs, it's pretty much impossible to go all the way to an open API spec file that you would then be able to validate on in an application or a security tool.
And we have one of those security tools inside API Gateway.
It's called Schema Validation, and it does exactly that.
It takes an open API file and then validates that the incoming requests match what's expected as far as path parameters, query parameters and their formats.
So when we looked at API Gateway's usage patterns, we saw that customers would definitely discover APIs but then never enforce a schema.
And so, when we asked "Why not?", the answer was pretty straightforward. Even if they knew an API existed, it took so much time to track down the other team, to ask about the details, maybe it was programmatically generated and easy, but maybe it wasn't.
It was an old codebase, they weren't running it anymore. And really the lack of time and expertise was the biggest gap in customers enabling a schema validation protection.
So, end of the day, we found out that like the same learning process that allowed us to run API Discovery we could use to learn the schemas once they were discovered and then using that method, we could generate the open API file and then upload that into Schema Validation and enforce a positive security model on that API.
Gotcha.
So so schema learning now, how does it fit into the workflow of what customers are using?
Yeah, it's, uh, it's simple, but it's pretty powerful.
Customers are able to go from being vulnerable about stuff that they don't even know about to being protected with a validated schema in less than 24 hours.
Um, I mentioned API Discovery is already waiting for you once you log in, and once you add those discovered endpoints to the system, our schema learning begins to run.
It'll run on the API.
You can download the learn schema, send it into schema validation, and then you're protecting and enforcing that positive security model.
Gotcha.
And actually, I think this would be a good time if we can get a quick demo of the capabilities.
We have about five minutes before we have to wrap up. So if you wouldn't mind just showing us how does this workflow actually sit in our Cloudflare API?
I'd be happy to.
Yeah. So here we're looking at the dashboard and we land on Endpoint Management.
Now, in this demo system, we already have a lot of endpoints added to the system and that helps because I can just easily export a schema.
But first I want to show you API Discovery and walk through that flow that I just talked about.
Inside of API discovery, if I look at my machine learning learned endpoints, they're all listed for me here.
And to add them to the system, I can select all and save. But since they're already there, they're listed in Endpoint Management.
And once an endpoint is listed in Endpoint Management, this is when secure, this is when schema learning really kicks off.
If you give it about a day, you can export the schema, select your host name, and leave this box checked here: Include learned parameters.
This is making sure that you include the learned parameters from the schema.
When you export that schema, it'll download the open API file and I've got it here in a separate tab.
I zoomed in a whole lot and highlighted a section where we've said, "Yes, indeed, we have observed a parameter for this path.
It is a string and it is exactly 36 characters long." So this is an example for this demo lab where schema learning has said this is exactly what to expect and written it all out in this open API V3 schema.
Um, after that, once you have the schema, head on over to Schema Validation and this is where you can add the schema that you just exported and end up with a rule that you can then look at inside of the firewall events page.
So, really powerful stuff.
And at the end of the day, the main takeaway here is that you just went from unknown unknowns to having an API protected by schema validation in 24 hours flat.
Cloudflare is able to give you that protection where previously you only had blind spots.
And just hold on right there, John.
I think this is great for those who are new, perhaps new to Cloudflare's security products or especially to API Gateway, what you're looking at is the dashboard and this is under the Security and the API Shield tab that you're able to get this.
Very important to note is that any viewers on Enterprise plans or looking at our Enterprise plans can check out API Shield in your dashboard and especially on Enterprise plans, you can turn it on as a trial for your endpoints today.
So, John, thank you again.
I love that we're making our customers' lives easier with really very, very little effort on their part and taking it from discovery to actual protection.
So with that, I want to thank both of you. I wanted a quick recap of what you all have heard as viewers out there.
The first, we talked about fraud detection.
And fraud is affecting all online businesses.
And, you know, it's not enough today to try and use homegrown solutions. It causes a lot of friction.
What we are, what we're talking about is tackling the four big problems around fraud, fraud detection.
We're talking about fake account creation, we've talked about carding, expediting, as well as account takeover.
And then secondly, we've talked about the API Gateway.
And one of the fundamental problems we see with a lot of organizations is they just don't know what APIs they have and it's very difficult to go do that in, in some manual fashion, especially in large organizations with APIs being created by all sorts of teams who may be in a variety of locations.
And again, what we've talked about is our API Gateway solution provides a easy, almost no effort solution that can identify all of your public-facing APIs and then actually identify the schemas, how it's actually, how do those APIs work and what is required in the request, and then actually put that through into our protection, into our positive security model, so that we can actually just protect your APIs from that starting point.
So thank you, Adam and John. I'd like to mention that both of their blogs are available on our site today.
Um, it's available on, you see the Fraud Detection blog on blog.cloudflare.com.
And, I think it's just shown up here, you see that right there by Adam.
And then, you also see the API endpoints discovery and schema learning from John and team.
So with that, thank you to both of you and thank you to all the viewers out there.
Thanks, everybody.
It was a pleasure.