🔒 Security Week Product Discussion: Bots: The Good, The Bad, and The Ugly

Presented by: Michael Tremante, Ben Solomon

Originally aired on January 19, 2024 @ 4:30 AM - 5:00 AM EST

Join Cloudflare's Product Management team to learn more about the products announced today during Security Week.

Read the blog posts:

Tune in daily for more Security Week at Cloudflare!

SecurityWeek

English

Security Week

Transcript

Hello everyone and welcome to Cloudflare TV. My name is Michael Tremante, I'm the product manager here at Cloudflare, Really excited. We're in the middle of security week for 2022. Today is day three. And just like any day, we're about to publish some great announcements on the blog. But please stay with us for now. Go check those out in a moment. And I'm joined today by a colleague of mine who's also working on application security. Ben Solomon, really happy to have you online with us today. Ben, do you mind just quickly introducing yourself to the audience? Of course. Hi, everyone. As Michael mentioned, my name is Ben. I am a product manager here at Cloudflare and I help look after a couple of our security products. So API security as well as our bot management product. I'm so excited to be here. I love talking about bots. I love all this stuff. So thank you for having me. Awesome. Just a couple of items before we jump into today's topic. Of course, this is a live stream and if you have any questions, feel free to email livestudio@cloudflare.tv. If we receive any, we'll try to account and leave some time for them at the end. If not, please also do reach out to the support team or if you're an enterprise customer to your account team and we'll be very happy to then follow up with any questions that come out of this session. With that, let's kick it off, Ben. Great title for today's session. Very disappointed you didn't actually watch the movie before coming up with a title. But with that aside, everyone I think who's watching us today will be using Cloudflare, but not necessarily every single person, and bots is a pretty big topic. Before we jump in, do you mind just giving us an overview of what it is about? Of course. And I should mention I went on the Cloudflare TV site and in some cases I didn't see the title there. So the title Michael and I have chosen is Bots: The Good, The Bad and the Ugly, because we're talking about everything in bots world today. But yeah, let me let me give you an example of a bot. Really, a bot is anything on the Internet that is an automated service, right? And so this can be a wide, wide range of things. They can be good, they can be bad, they can be ugly. To give you an example, though, let's say I'm obsessed with the classic Clint Eastwood movie, The Good, The Bad, The Ugly. And I want to find all of the different content online that is in some way related to the movie. I have a couple of different options, right? One is I can get on my computer as a human being and I can Google search, right? I can find a Wikipedia article, I can go and find some old video. I can sort of manually browse the Internet on my own and try to track down every single piece of information that's tied to that movie. And that's fine. Those are all human requests that are that are prompted by me. But I have another option, which is to build some kind of a service, right, that's operated by my computer or by a server somewhere, which automatically fires off hundreds or thousands or millions of requests online in a fraction of the time that I could make just a few requests. Right? And so that ability to launch an automated service that maybe crawls the entire web and tries to find all of the data that's tied to this movie, that's a bot, right? And the really interesting thing about bots is they can operate at massive scale. They can do so much more than humans can do and they're not really fallible in the same way that humans are. So they don't make a lot of mistakes. They kind of just do exactly what you tell them to do. So they're very effective for carrying out the sort of bulk activities that you need to do online. Obviously, that technology can be used in good and bad ways, but we'll get into all that. Yeah, I know, that makes total sense. And I remember actually before I joined Cloudflare, I used to work at another company called Netcraft, where we used to survey the web, and that's actually a really good example of a bot, right? Every month we published reports on the state of the Internet, how many websites there are, what SSL certificates are being used. And I remember I used to be responsible for running the survey a couple of months and we used to spin up hundreds of EC2 instances and we'd have these bots crawl and download as much as we could to build these reports. And as you said, yeah, you have infinite resources to some extent. If you have if you have many servers, you can really try and crawl very, very, very quickly. So to that point then I guess not all bots are made equal, right? At least when I was working at Netcraft, we were, at least in my opinion, a good bot. But some of them could be bad as well I guess. Yeah. And look, the reason we created a Cloudflare bot management product is because in general, most of the bots out there are bad. The wonderful thing about humanity is we're all very creative and we can come up with lots of lots of use cases for bots. But the folks who have figured out the malicious use cases have really run wild on the Internet. And so if you think about, let's say some popular item goes on sale, right, and maybe it's an exclusive sneaker that's being sold online. Folks have figured out you can create a bot which shows up on a website and buys five, ten, 100 of those pairs of sneakers before any normal person can get their hands on just one. And so we see a lot of these malicious use cases where someone is showing up and using the bot for bad purposes. Sometimes it's even worse, right? Sometimes bots are showing up in there. They're testing out username and password combinations on different websites. And when you start to see that stuff happen, happen, it gets a little concerning because you start to worry about the end users who are actually affected by that. They're real victims to those actions on the Web. And so that's the reason we created bot management is to sit in front of different sites and to block the bad bots as they show up. Got it. Okay. And then, of course, just like we have bad bots, you mentioned not not all of them are bad and some of them are good or legitimate to some extent. Right. And we have the ability to recognize or at least we maintain a list of what are the actual really, really good ones. If I'm not mistaken, correct me here. We consider those verified. That's right. And so I can give you a couple of examples of those. Those verified bots. I mean, talk you through the whole process. But really, you gave a wonderful example earlier of a good bot on the Web. I gave one of sort of a service that is crawling for information online, which is maybe good, maybe bad. But there are certain services we consider to be kind of objectively good, right? Google is a tool that we all use, right? We all try to search the Web and find different information. And in order for Google to populate their search engine results, they have to send a crawler out onto the Web. They have to send something out that's going out and looking at every single website because it's just not feasible for a human being to visit one website and another and catalog all of that information. And so we've got a lot of tools like that which are generally helping people and too are generally well behaved, meaning they don't aggressively show up at sites and try to take them down. They're very conscious about how they act on the web. And so, yes, there are good bots like that. And sentry, which is a security tool. There are all these different companies that have built really interesting and useful bots that adhere to kind of general standards of not being awful right as they're calling the Web. Now, you mentioned the verified bot system, and this is really important. When our customers set up something like bot management, they need an ability to exclude the good bots from the detections we're making. Because otherwise, if you put bot management in front of your site, it might stop the bad folks from showing up, but it's also going to stop the good folks. And so suddenly you're not going to show up on Google search results. You're not going to show up anywhere. And that becomes a real problem for you if you rely on those good services. And so we've long had this system where operators of good bots can come to us and say, Here's my bot, right, I'm going to make my case. One, you can see I adhere to all of the general standards that you ask for. There's, as an example, a document you have to provide called robots.txt, which kind of outlines the way you behave on the Internet and tells folks what to expect if you don't provide that document. That's a real red flag that you're not behaving well. And so we look for things like that and we ask for all this different information that can help us identify bots. This verified bot system then consists of hundreds and hundreds of good services we verified, and our customers can exclude those from the different detections that we make. Right. And so this is it's a key part of the way we run bot management, but also a really interesting and complex system. Got it. So I assume that the verified bot list includes things like even the big search engine, Google, Bing, etc., as well as anyone who signed up or we've added to our verified list. Funny you mentioned robots. I think a lot of companies have really fun robust .txt files. I saw a post not too long ago on Hacker News and it was just a stream of who had the most creative robots.txt file. But having quoted a crawler myself once. Definitely, you know, good bots will follow that file, but it's, you know, sometimes you forget about it, and definitely you start misbehaving and you might be crawling things you shouldn't. So at that point, though, if I do build my own bot and I want to make sure I'm behaving well and that cloud mitigations are not necessarily just blocking me out. Right. How yeah. How do I think about signing up or making my bot be included in the verifying list? Yeah, so there's a lot to cover here. The first thing is we have a public form, right? And you can fill it out. Technically, anyone can show up and give us information about a bot. We tend to ask that the bot operator is the person to report this information to us because it's just like if you're a user of a good service online, you tend to not have as much accurate information as the person who actually created that service. And so you always have this opportunity to fill out the form. If we get a request, we review it, and that's great. There's a bunch of criteria there, which I can get into in just a second about why certain basically certain services are approved and certain services are not. But the other thing I should cover is the way we actually verify bots as you're filling in that form, right? So if you operate something online and you tell us, here's my bot, I want you to immediately exclude it from any sort of bot management activities online. We need a way to actually identify your bot, right? And it can't just be the name because let's say your bots name is the word random, right? Anyone can show up and say My bots name is random. And if we just allowed that through our network, you know better than I do, that would be a major security issue, right? Suddenly tons of folks would be able to get right through all sorts of Cloudflare protections. And so we ask for you to identify your body through one of really four methods. The big ones are, first of all, you can provide us with an IP list, so you can say, here's a list of different IPs that I know my bot will use as it shows up and crawls different sites online. And that IP list can change daily. As long as you give us a link where we can fetch those IPS from, that will work. And you can tell us that in the form, right? We use other things like RDNS. So if you think about typical DNS type in a URL in my browser, that's kind of mapped to an IP address in the end. RDNS is the opposite. So we can actually look up your service based off of the information we have, is it as it reaches Cloudflare, and then we've got all these other methods we can use machine learning to try and identify your bot. We can use things like user agent and AS validation, we're getting into some serious technical details here, but we've got all these different methods we can use to identify your bot as it shows up. And so those are the different things we ask for in this form. As you're signing up, we need you to give us at least one really legitimate way to identify your bot that can't necessarily be spoofed. Now, what I think you're getting at is what happens if I sign up and maybe my service is too small, or for some reason it doesn't meet the criteria? Yeah, because at that point I think those verification methods are really good. But if I were to build the bot or even if I had a small business, I don't think I necessarily have the resources to manage dynamic IP list, update, reverse DNS and all those things. So I'd probably be in the gray area there. What's the options in that case? Right. And there's, look, there's a lot of there's a lot of reasons you could be in that situation. You might just be a small startup and maybe you've just launched some sort of a tool. And for whatever reason, this automated tool doesn't meet our criteria. Right? It might not broadly serve the Internet in a good way. It might not necessarily be well behaved, you might not have the documentation, and so that's one of those cases where you might not see success going through the form. You might also just be a scraper like the one I described earlier where I'm trying to scrape information online about about movies. And I may not have bad intentions, but who's to say that what I'm doing is really good, right? A lot of sites out there may not want to be scraped. And so we are in this difficult position where we're going to have to arbitrate and we don't want to do that. Right. So that's one of the challenges there. If you are in this position, right, where you operate some sort of smaller bot or something that doesn't meet our criteria, you need a solution, right? Because you're not necessarily going to get a response from the form, but you still want the ability to allow lister bot on some site online. And so that is the premise of today's announcement. If you look on our blog right now, there's a blog called Announcing Friendly Bots. And that's because friendly bots are in this exact position, right? They're not bad. They're not services that we can objectively say are malicious. They're also not good necessarily, right? They, they at least aren't verified in most cases. They're friendly. They're kind of in the middle. They want to show up on your website, and certain people are going to like them, certain people aren't. And so we've built a feature now that we'll use all of the validation methods I talked about. To basically allow this bot just on your site. And so I can I can go into more details there, but that is the concept behind friendly bots. Got it. Okay. So essentially site owners, just to make sure I fully understand, site owners are able, will be able to, allow list and let those bots that are not necessarily in our verified list. But according to my definition, I'm happy to have them crawl my application or access the content. Therefore, I would be able to set some validation methods so so they can get access. And basically I'm making an exception for for a friendly bot. Is that is that a good summary of it? That's right. And I'll walk you through the the user experience there. Let's say I own a site that's behind Cloudflare. Right. And I can log into the Cloudflare dashboard. I can edit anything I want. Great. Before, if there was a particular bot that I wanted to be allow list, I'd have to file the form. Right. I'd have to go globally and all of this stuff. Basically I try to get this this service allow us to globally across Cloudflare. Now I just log in to the dashboard I find the friendly bots tile and then I'll fill in all the information within the dashboard. Right. I'll add the IP addresses for this bot that I'm particularly concerned about. Maybe I'll add a description and then I'll just submit it. And instantly we don't require any sort of manual verification. Now we'll just fetch all the data using the same engines that drive verified boss and immediately start allow listing that bot for your site. And so the beauty of this is we're now doing it at an individual level. You've said you're okay with the bot, and so we don't need to put a person in front of that to check and make sure it's an okay bot. Does that make sense? Yeah, I know. Absolutely. And actually now we reiterated we're using the same validation methods. It means it's a lot more sophisticated and effective compared to allowing a user agent or something in a custom, custom role. And it's really exciting actually. And just for the audience, of course, the blog announcements are live right now, but stay with us for another 50 minutes before you go read those and I can see how this is going to evolve in the future as well. If, you know, as we observe, you know, what what friendly bots or people submitted to the dashboard, and maybe we can start making good use of that data. Talking about data and sticking with the theme of bots, of course. Are there any other updates in terms of what we're going to be reporting when it comes to bots? And then finally, just to make sure what are the timelines, the exact timelines we're looking ahead in terms of friendly bots and when will customers be able to use the feature? Sure. So a couple updates there. I'm glad you asked. The first thing is, for a long time, the verified bots that we have had and allow listed across our network, there hasn't really been a good way to see what's in our database. Right. You've kind of had to reach out to us and we can tell you if a certain bot has been added, but it's rather challenging to figure that out. And so beginning today, you can go see this right now, you can go to Cloudflare radar, which is radar dot Cloudflare dot com scroll all the way down and you'll see a list of the verified bots that we have. This is the first time we're making this list public. It's not exhaustive yet. And so I should note that we are adding more over time. We basically want to make sure that these services are ready for primetime before before we show them on radar. But the first 100 or so are on there right now. And so now you have a way to actually go and look and see if something has already been verified. If it's been verified, you don't need to do anything right. You don't need to create a friendly bot for it. And that's fine. It just works for everyone. But there is a source of ground truth. Now, if you're curious in finding this kind of information, the second update I have for you here is on the timeline. This is a feature we're testing in early access. And so our primary concern, whenever we build anything like this, of course, is about opening holes into our network. We want to make sure that we can do this in a responsible way that just affects individual sites and that if, for example, an IP list suddenly changed, we wouldn't compromise your site in any way. So that's one of the reasons we're continuing to test this, make sure it's ready to go. We expect to launch this at some point this year and you'll be able to use it for your site included with bot management. That's awesome. Really good to hear that. We're also publishing or at least intending to publish the full verified list on radar. Oh, I actually before we even move off of the friendly boss topic, I should tell you why this also is really cool as you're adding friendly bots to your system, right? It's you're building up kind of a catalog of the bots that are important to you. Michael, you kind of hinted at this earlier with the data piece, but we can start to see which bots are common, commonly added by a lot of our customers and feed them into the verified bots database. Right. And so this was a problem before we just couldn't keep up with all of the verified bot submissions we were getting. Or we could, but you know, we wanted to make sure that we were doing our due diligence and actually reviewing them. And now we have a steady source where we're able to see which bots are consistently added across our customers, which ones are working, and make sure that we have the right IP info, the right RDNS info, all of that stuff. And so the real benefit here is, even if you just add a bot for your site. You're going to benefit on the verified bots end as well because we'll be adding in more there. Right. It's pretty much a priority list for us, right? If everyone, a thousand customers have the same the friendly bot, we know we should maybe reach out even if we had to reach out and consider making sure they're part of the verified bots. Thank you. This is really awesome updates and actually staying on the bot topic. But switching gears a little bit, I have the opportunity to speak to a lot of customers, and people who are running websites, etc., and very often talking about bots. They mentioned that they have a struggle about differentiating between what is actually a program completely automated, no human intervention, versus maybe someone using a native mobile application connecting to their app, to their backend web application and differentiating between the two. Is this something that you're also hearing? And if so, mind giving some details on what the actual challenge is there? Yeah, this is a really interesting issue because anyone who has a mobile app has struggled in some way to secure it properly to make sure that the legitimate traffic comes through and the bad traffic is kept out. And so we spend a lot of time looking at this, particularly on the bot management team, to see really what we can do. Now, if you look across the bot management industry, folks have had an issue in the past which is mobile apps tend to look automated because they make API requests, right? And so if you actually deploy a bot management system in front of your site, which is receiving both desktop and mobile app traffic, it may block those legitimate mobile requests. And that's not what you want to happen, right? Because you want folks using your mobile app to be able to use your actual resources, right? And so the challenge here is, as that stuff is showing up, you want bot management to somehow know that traffic is coming from your app instead of coming from someone who is impersonating your app. The traditional approach here has been to use an SDK. And for those of you who aren't familiar with SDK, the concept here is you can add some code into your app which basically collects other information and sends it along with the request to whoever is providing your bot management service so that that request is sort of validated as it goes through, right? But we at Cloudflare have never used an SDK, and we're definitely an outlier in this respect. It's very common across the industry, but there's a reason we've chosen not to use one, and that's because, one, they're really bulky. They take a long time to build into your app. Plus, if you actually change your app, at some point you have to change the SDK too. Two, they are potentially a privacy concern. And so the SDK is you'll see, if I have an app on my phone that's using an SDK, my phone is blending in here. You'll see that there's a problem because that app is gathering things like your accelerometer information. How many, you know, the distance you traveled in a certain day or the speed at which you travel, that distance that should not be gathered for a social media app or something just for bot management purposes. Right. And so we take issue with the fact that all of these extra signals are being gathered just to prove that you're a human. The other thing is these SDKs are just kind of a pain, right? The folks we talk to, they don't want to spend time implementing SDKs when they need a management solution. They often come under attack and they show up at Cloudflare and say, I need board management today. I don't have three weeks to go implement an SDKs. I need to implement something now. And so a Cloudflare. We've seen this issue and we've made an active choice not to use an sdhc, but rather to try and develop an SDK alternative that may help us in the long run. Got it. Yeah. And actually, now you mentioned the SDK. One more thing that comes to mind, especially as we announce Page Shield, which is client-side security, at the end of last year, if you had an SDK, potentially, you are also adding another third party that could be compromised or a library that you don't control. Right. So even from a security perspective, it might cause you some issues. So we went down the no SDK route. It seems like you've also heard about this issue and it's pretty commonplace across the industry. What are we what are we doing about it then? Are we doing something special to Cloudflare to make sure this is a solved problem? Yeah, we spend a lot of time thinking about it. And actually, before I even go into detail, I should mention there is a blog post going out tomorrow. This is the sneak peak that's going to go into all of this, right? It's going to explain exactly how we think about mobile apps or bot management, as well as a lot of the changes we're making. And so this is one of those rare security week sneak peeks where you can see this isn't something dropping today, but you'll hear about it in just a bit. We've done a lot of different things. And the number one thing we've done, which the blog post focus on focuses on is a new ML model that is specifically trained to mobile traffic. So for months now, I've been having these conversations with customers where they say. While management works great, it works so well on my desktop site and in most cases the ML you're running, which is taking in all of the requests that come into Cloudflare, is doing a pretty good job for my mobile app as well, but there are some cases where I think you're scoring my mobile app too low, meaning I think every once in a while someone is making a request from my mobile app and it's getting scored as a bot and I don't want that because legitimate users are getting blocked at the end of the day. So spend a lot of time talking to these customers, gathering their feedback. And what they've done is they've given us specific examples where their legitimate mobile app traffic was blocked. They've said this IP routinely gets blocked even though it shouldn't. Or they've said, here's the user agent of my mobile app. I want you to train around that because I know that this tends to be good. We've gathered literally thousands of examples here from a lot of really large companies. I've talked to large food delivery companies, a crypto company, a banking company, all these different folks. And we've started to accumulate a lot of this data, which we've then spent time going back through and sort of analyzing. And we actually have a dedicated team that looks at this. Now, over the last few months, they've taken that data and built it into a brand new machine learning model, which has all of the benefits from before. You're getting all the same kind of bot detections that you've had before and that you wanted. But now it's specifically trained to deal with those mobile false positives, right? So it knows that a certain profile of traffic, the stuff that we were messing up on before, is now actually coming from legitimate mobile apps. And what this means is, as a Cloudflare bot management customer, you can just upgrade your ML version, which we can talk about how to do that in a second and you'll get all the benefits from this, right? From other customers who have reported mobile issues. You don't need to use an SDHC we just deploy the ML version. If you're signing up for bot management today, it works just like that. And so we've seen a lot of success so far. I don't want to misquote the blog, but something like 99.9% improvements for a lot of the big customers I looked at, almost all of their false positives are gone. They've just disappeared. Wow. That is that is really awesome. And I guess I wasn't going to ask that you let me now to ask the question. So for current customers using bot management today and they want to switch to the new ML model once it's available, what does that very quickly, what does that process look like? Right, You basically want to talk to your account team. And so there's a person at Cloudflare you've been working with, chat with them. The reason we don't automatically move you. I get questions about this all the time. Why wouldn't you just upgrade me? Is because a lot of your rules are based on the current model, right? They're tuned to the traffic we have. And so if you let us know, we'll dedicate a solutions engineer who can review this with you and then migrate you when you're ready for it. Awesome. Thank you for sharing that. And just one last question before we close this off, Ben. Today you had another great announcement, of course, and there's been a press release about our new API gateway. And last year you announced API shield. Does this tie in any way with what we're doing with bots? It totally does. It was actually born out of the bots use case and this is really interesting. Let me back up for a second. We're going to do a session on API Gateway later today. But the basic idea is Cloudflare has now announced we are going to build our own API gateway. Most of it is ready today, ready for you to use. This means we are for your APIs doing management, we're doing monitoring, we're doing security, all of these wonderful things that you've needed and traditionally you've had to go to some sort of third party that you can now do right at the Cloudflare Edge. And so that's going to make things so much easier. But the wonderful thing is, as we talked about, mobile apps consume APIs, right? They make tons and tons of API requests. And so of course, you can use bot management and you can use this great new mobile model that's going to work really well. But you can also deploy API Gateway and a subset of that API shield for security to protect those different endpoints as, as they're getting different requests. Right. And so there's a beautiful opportunity here to, I think, combine the two things where you've got bot management in front and then your custom configuring different sorts of API policies to really protect your mobile apps. There's a lot of cool stuff you can do here. Yeah. No, that's awesome. And I agree with you. I think the two things that I really, really well, even because APIs are by definition built for bots. Right. Or for non-human consumption. So I'm a strong believer, believer of security in depth and layering things is always better, right? There's no silver bullet to solve all security problems. But if you have management API gateway, an API shield security component and being biased here and the web application firewall in front, it's all going to make your overall security stance better. With that, we've got a minute to go, of course. So, Ben, thank you very much for joining us today. And just a reminder for everyone listening, you will be back on Cloud four TV shortly to talk about more in-depth API gateway as a broader subject. We are currently at day three of security week. It's a Wednesday. There's a lot more coming your way, of course, tomorrow, Friday and Saturday. so stay tuned. Thank you, everyone, for listening to us and have a have a good day. And if you have any questions, remember livestudio@cloudflare.tv or reach out to your account team or support and we'll be able to answer. Thank you very much.

Security Week

Security Week is one of Cloudflare's flagship Innovation Weeks, and features an array of new products and announcements related to bolstering the security of — and ultimately helping build — a better Internet. Tune in all week for deep dives on each...

Watch more episodes