Latest from Product and Engineering
Presented by: Jen Taylor , Sergi Isasi, Jamie Herre
Originally aired on October 3, 2022 @ 9:30 PM - 10:00 PM EDT
Join Cloudflare's Head of Product, Jen Taylor for a quick recap of everything that shipped in the last week. Covers both new features and enhancements on Cloudflare products and the technology under the hood.
English
Product
Engineering
Transcript (Beta)
Hi, I'm Jen Taylor and welcome back to the latest from product and engineering. My partner in crime Usman is unfortunately otherwise detained shipping compelling new things and saving the Internet.
So I am holding down the fort today with two of my favorite product and engineering partners here.
Can you guys go ahead and introduce yourselves?
Sure, I'll go. I'll go first, Jamie. Hello, my name is Sergi Isasi. I am a director of product here at Cloudflare and I work on a number of things, but one of those being our bots product.
Hi, I'm Jamie Herre. I'm a director of engineering here at Cloudflare and I work on a number of things largely related to data and lots of it, including our bot management product.
So Jamie, what's the linkage between data and bots for us here?
Well, so the way I tell the story, there's always lots of ways to tell the story.
The way I tell the story is that the bot management product actually grew out of the data team.
And really some engineers on that team had some ideas about how we could draw insights from the data that we had about the kinds of traffic that people were seeing.
And it turns out that worked and we wound up building a whole product around it.
That's pretty cool. That's pretty cool.
Now, Sergi, so we've done a lot. I mean, when did we actually, we kind of brought to life the latest incarnation of what we do with bots, what probably two and a half years ago now?
That is actually we're coming. Yeah, that's about right. It was original kind of early access beta about two and a half years ago.
And we went to enterprise GA right around this time, two years ago.
Yeah. So it's been some time now.
It's been, wow. And I feel like the sun never sets on the work that we're doing on bots.
There's a ton of stuff coming out from this team. What's kind of the latest and greatest?
What are you most excited about these days? So I think the quiet thing that we've done over the last year and actually goes to what Jamie said is we've significantly reduced the number of captures that we show.
And that's not specifically a bot management thing, but the bot management team works on that.
The reason it's related to what Jamie mentioned is this project was originally, can we predict based off of our data, whether a user will solve a capture?
And that grew into, well, that means we have a bot management product.
That's kind of what was built out of it.
But we went all the way back into that. And about two years ago, we kind of looked at, I don't know, yeah, but I guess two years ago now, we looked at the number of captures that we served and why we serve them.
And this is across Cloudflare.
So this isn't just strictly bot management. We basically give any Cloudflare customer the ability to challenge a user.
And we found over the 10 plus years that we've been protecting websites that our customers write rules for lots of fun reasons.
Sometimes they're just unsure about traffic from a certain region or their origins are particularly weak and they just want to make sure that the traffic is getting to them in an orderly fashion.
But what that means for an actual human on the Internet is they occasionally saw captures at very high rates and had to solve them.
And that's kind of frustrating. So we started on a project we launched in kind of early 2019 called NoCapture with the goal is to eventually serve none.
And this year, we've made a lot of strides on that. So we measure that it takes about roughly half a minute for a user on the Internet to solve one of our captures.
That's a lot of time. And we don't want to do that when we are sure that it is a human, even though our customer has said, I would like to issue a challenge.
So earlier this year, we started rolling out slowly our iteration of something we've called managed challenge.
And you can actually see it in our documentation.
And what managed challenge does is it allows us to decide whether or not we will show the visual capture.
And we originally rolled this out for a product that we can talk about in a little bit called super hot flight mode in a larger scale.
We then rolled it out fully to all of our free customers.
So we converted every single free challenge from a capture to a managed challenge.
And so now we see that kind of 30 seconds at time spent on solving a capture.
We've cut that down into by one third overall. And that's despite us, actually, these numbers are weirdly symmetrical, us showing one third more challenges over the last year, organic traffic, people adding sites to Cloudflare.
So we've made a significant reduction in that kind of annoyance on the Internet when you get challenged.
Most of our challenges are now solved without a user needing to do something somewhere between roughly three quarters of challenges.
Now the user just gets a little weight, and then they go right through into the site that they're trying to take a look at.
So we're pretty proud of that, extremely happy about it.
We have made it available to all customers now. So our enterprise customers can go and select that if they'd like to have a more user -friendly option for challenging users.
And we're going to continue to drive that number down.
That's fantastic. But okay, so hold on a second. So basically what we're doing here, the dark art of bot detection is first detecting reliably that something is a bot, then potentially thinking that it is a malicious bot.
And then it has historically been figuring how to solve, figuring out how to serve them a challenge.
But what I'm hearing you say here is we've taken it one step further to sort of say, we're going to do some additional sort of detection and analysis to kind of even figure out whether or not we need to serve the challenge.
Yeah, it's kind of twofold.
So many of our challenges are not served because we, Cloudflare, think that the connection is a bot.
Like I mentioned, a customer could say, I would like to challenge, let's say I am an e -commerce site in Canada, and 99% of my customers are in Canada.
Maybe I want to challenge every user that's coming from an IP from the rest of the world.
That doesn't mean that that user is a bot. Could be very sure that it's a human, but they just want to put that challenge in place just in case, right?
Maybe they don't use our bot products. And so those are the users we absolutely want to fast pass through the challenge.
They're a human.
We're extremely confident. So that's kind of one set. The second set is a bot management customer.
So we are unsure. We're not confident that you are a human.
We believe you are likely to be a bot, but we are not sure. The ability for us to inject or put more challenges into the browser and into the user using the browser gives us more signal.
So we don't want to do that with every request because some of the challenges can be heavy and take a bit of time.
But when we are less sure, we want to take that second chance to get more information from the connection.
Got it. So but all this means is that I'm going to have to spend, or I have the opportunity now to spend less time looking at grainy pictures, trying to figure out which one has a fire hydrant in them.
That's right. Okay. Jamie, this sounds hard.
How'd we do it? We made things more complicated.
That's, you know, for engineers, usually the best answer and the preferred answer is that we want things to be simple.
Right. But this system is getting more complicated because we've replaced some very simple thing.
Okay, here's a CAPTCHA with a whole multi -layered list of challenges and protections and complicated things and inferences.
It's challenges on top of challenges.
And the reason for this, I think, is that, well, it's kind of interesting that over time, the whole question has gotten fuzzier, and I think in a good way.
In other words, the CAPTCHA, part of those letters stands for Turing test, which is, you know, this famous way to determine whether you can tell the difference between a human and a computer.
And those tests themselves are increasingly in question.
Like, do we care? Does it matter?
And, you know, what's actually happening? An example might be that one way to bypass the CAPTCHA is to actually sign up for a service where there's human beings somewhere solving the CAPTCHA for you.
This is still undesirable from our customer's point of view is trying to block this traffic.
But from the Turing test point of view, that's just totally fine.
So the problem here is actually that the, was with the definitions.
Like, we need the definitions to be fuzzy and to be more adaptive.
And that's definitely the direction we're going. So the complexity of the system increases, but it gets smarter.
I mean, to be able to get ourselves to a place where we can sort of challenge behind the scene, not rely on that human signal.
What are some of the investments or some of the innovations we really had to kind of focus on to get ourselves to a place where we feel kind of confident that we got it right.
We don't need to flash you a picture. A lot of it has to do with data.
So, you know, we had to build platforms that would enable us to essentially do experiments and then iterate on those experiments in order to improve.
And so the team, you know, like it's a great privilege that I have to be able to stand here and talk about these things, but really there's a great team of people behind this that have some great ideas and have been working for a long time to get to the stage where we can consistently iterate on this and continue to improve.
And the way you see the improvement is less annoyance and fewer, you know, make work challenges, maybe more challenges for your browser or your mobile app to do.
I think one thing that we have two competing goals there to some extent, right?
One was we want to make it as easy as possible for a human to access a site that's being challenged.
But we also have adversaries on the other side that are trying to defeat these challenges to gain access to the site at scale with automated traffic.
So, Jamie's team has to spend a lot of time also detecting when one of our challenges is possibly being defeated and then pulling it out of rotation, modifying it.
So, it's a very interesting problem space where we, it's a bit of cat and mouse, where we actually end up having to do things like pull an attacker down a path to make them think they've solved something and then kind of the rug out from underneath them later.
And the platform is really what allows us to do that.
You can't do that where you're just kind of a bunch of people pulling a bunch of PRs randomly to change something.
The telemetry and the metrics behind the system is actually really impressive.
I think our goal is to someday make it completely invisible that, you know, people will see a CAPTCHA and say, what is that?
I've never seen one of those before. Like, you know, ask your grandfather, what does this mean?
Because I've never seen anything like this. Our goal is to make it all just work.
There's an answering machine, there's a VCR, and there's a CAPTCHA.
It's like the wreckage of technologies of the past. Is that what we're looking for here?
Yeah, it's like the rotary phone of two years from now. But, you know, you touched on something that I think is a theme that I see this team really tackling in a bunch of different facets.
You talked about sort of, you know, kind of providing technologies, like the team sweating it out so that customers and end users kind of don't have to sweat out the complexities of the challenges and the CAPTCHAs.
I think about sort of the investments we've made there with data and this detection, but then all the way kind of on the other end of the spectrum, really around sort of ease of use and ease of deployment for users at scale.
I'm thinking specifically about some of the work that the team's done around super bot fight mode and making, you know, bot management a really simple and easy to use and easy to deploy solution for, you know, super basic users.
You talk a little bit about sort of the kind of the vision behind what we've been trying to do there and where we're at.
Sure. So we, we have lots of customers who are not particularly technical, but they have a site that they need to protect.
And typically, if you're looking at bot management, you're looking at something that is pretty heavy, right?
From if you wanted to implement it, you'd integrate it with code, you'd have to write policies.
And in some cases, our large enterprise, that makes a lot of sense.
You have very specific endpoints you want to protect and certain types of traffic you want to opt in and opt out and all of this.
But for lots of our customers, they just want automated traffic, you know, non-good automated traffic off of their site.
And they don't want to have to embed something into their login page, for example, or, or tune a policy.
And so we always want to bring kind of the, the, the enterprise level security intelligence down market to users who can just say, I want my site protected.
And so I think it was in 2018. I'm trying to remember correctly. That sounds about right.
We, we launched, and actually I think that at the beginning, we might've missed a year of bot management.
We, we, we launched a bot fight mode, which was a really, really simple, our, our kind of base detection model on, on all of our, all of our sites and our customers, any customer who went on Cloudflare could go and press the button and, and, and, and aggressively challenge obvious automated traffic from, from cloud data sets.
So a good first step. As we got more and more comfortable with how well our product worked at scale, we launched a super bot fight mode in earlier this year.
I'm trying to remember if my years are correct there, which allows our pay-as-you -go customers to take advantage of the rest of our detection stack.
So our machine learning, which is powered by all of the data that Jamie mentioned and, and turn it on for their entire site.
So not just a very obvious spot on cloud ASMs, but, you know, block automated traffic across my site.
And that has been awesome. The uptake is we, we, we have hundreds of thousands of customers using our, our bot fight mode products which, you know, we, we were pretty blown away with the reception of that.
So it's great for our customers.
It also allows us to get more data and experiment a bit more where we are now, instead of seeing traffic and saying, we're pretty sure that would have been a bot, but we didn't do anything about it because, you know, the customer didn't ask us to, we now have the ability to see traffic, test out some of the new challenges Jamie has mentioned, and really have a lot more of a platform of experimentation with direct outcome, right?
So rather than saying what we thought and, and, and not knowing whether we were, we were correct, we can actually immediately challenge, see what happened and, and, and start creating faster iteration for, for the rest of the, the DML models and the rest of our product.
Sounds easy, right Jamie?
It is easy as long as you can keep moving.
So this, this product is, is special in the, in the sense that it, it will never be done or never be complete.
Like we say that about all products, but, but the, one of the reasons why is because the world itself out there is changing so quickly, as we're always adapting, we have to build new models for new things, the, the fundamental protocols of the Internet change and, and so forth and so on.
We're, we're always going to be doing this. I think of the SuperBot fight mode as kind of, it's like a simple, easy to use camera.
Like if, if you're a professional photographer, you want a really good camera with lots of controls and adjustments and, but, but for a lot of folks, they just want an Instamatic where you press one button and that's, that's what we have in this.
Yeah. But you know, it's, it's interesting. I think the analogies to the, to the camera is sort of an, an apt one, you know, because historically, you know, the way that we build and ship our products is we typically build them and we release them to our pay-as-you-go customers.
And then they get kind of more complicated and sophisticated and, and fine-grained detailed kind of over time.
And as we continue to sort of make them available to, to the enterprise customers who need that sort of the ability to sort of twiddle and, and tweet.
But, but I think here we, we, we found ourselves in a, in a very different position where we almost had to figure out how to build the camera and build the system and build the controls and then figure out kind of how to get those to scale to a place where we could, you know, I think that reliably, like the point-and-shoot automatic mode would, would yield a satisfactory result.
How did the, how did the team go about approaching that challenge and how did you get to a place where you felt really kind of confident that you had kind of the right technologies to be able to offer that simple kind of point-and-shoot solution?
Wow. This is a good tip out there, I guess, where people are interested in building scalable things on the Internet is that from day one the, the bot management product, we, we built it with the, with, with the goal in mind that it would be for all Cloudflare users, all customers, every request, all traffic.
And so the, that assumption has always been there.
And so, so scaling is, is built in. It's like we, we were ready for that.
Yeah. And, and we also have all these other toolboxes that we can use from other parts of Cloudflare.
So firewall, rules engine, these sorts of things like are already exist.
So in some ways it, it's actually easier than, than it might sound because we, we, we had a lot of the pieces already, and then it's just a matter of tuning it in a way that doesn't require, you know, a lot of intervention from the customer.
Got it. So you built from the ground up there.
Sorry. Well, if you go with the camera analogy, right, we, we saw how eight out of 10 professionals were using the product and we built the product to look like that 80%.
Right. So it worked out really well. Yeah. Yeah. It's kind of a great classic.
I mean, I think there, there are two, there are two things we've touched on here that to me are sort of the hallmark of how we build product at Cloudflare.
The first is you sort of articulated Jamie, which is the like build with scale and a desire to service the entire base with this thing out of the gate, which is, is daunting if you haven't done it before, but at Cloudflare it's really good.
It's where we start. And then I think the other is certainly to your point of like, you know, shipping something, observing and learning and using that as the, the way that we kind of think about it and inform the iteration that we do.
Yeah. So for, for folks out there who might be super bot fight mode customers, like we, you know, we're not done.
It's going to get better. It keeps getting better all the time.
Yeah. That, that also is the third, third, third pillar of the Cloudflare product strategy.
We're done. But, you know, speaking of kind of never being done, you know, one of the, the, the challenges I think we consistently hear and we see, and we wrestle with as an organization is, you know, all of these methodologies, all of these strategies, the sort of the corpus of how I think the industry has historically thought about tackling bots was really designed for the configuration we're in now, right.
We're on our desktop computers, we're in a browser-based experience, we're connected to the Internet and, and we have all of these very specific technologies and tools to, to do the detection and to do the mitigation.
You know, as soon as I start picking up things like this and thinking about sort of, you know how do I think about it and protect my, my web application or my mobile, even my mobile browser experience, it's, it, the, a lot of those tools are, are, are no longer part of the toolkit.
So kind of the holy grail right now in the bot world is, is really the mobile experience.
Kind of where are we at on that and how are we thinking about that?
So it's, it's a really interesting question and I think you framed it appropriately, which is if you think about a desktop, you have maybe three operating systems generally, four browsers, and most of those are very highly concentrated.
So traffic on a desktop looks pretty normal. You, you know what normal traffic on a desktop looks like and, and we have a, a, a nice engine that picks up all of that traffic and, and tries to decide effectively when you're, you have a request coming in, how different do you look from every other request that I have seen?
And that works really well for desktop. It works pretty well for one of the mobile platforms as well, because that mobile platform basically has one OS and one browser.
But if we talk about Android, where we have, I couldn't tell you how many different versions of it are being run throughout the world through different manufacturers and different kind of flavors of it.
And then it's the, the different browsers of those manufacturers install those devices as well.
So it's not always the, the, the Chrome that comes with it. It's, so it could be anything.
And we just have a number, a large number of, of, of those types of combinations.
And so when you say, how does this given browser on this version of Android look, or this app on this version of Android look compared to the rest of the world?
Very different because it's just such a small piece of, of, of the traffic that we see.
So we've adjusted and are in the midst of rolling out rather than asking the question, how do you look relative to the rest of the world?
Is how do you look relative to everything that is claiming to be what you are?
And that has been a pretty big change in the way that we approach mobile traffic.
And we think is, is going to be kind of the way forward and, and how we can really drive down the, we're pretty good at, at identifying automated traffic on mobile devices, but there is a higher false positive for those kinds of long tail systems.
And this is doing a much better job at detecting that. And it's just, it's, we've always had, we've had the data, but cutting it in that specific slice and looking at it in a, in a more specific fashion as, as really made a difference.
Yeah. I think on mobile, if you think go back to the desktop for a minute, you, you actually know that you're using a browser and you chose it perhaps.
And, and it's, it's clearly a browser, whereas on mobile devices there are apps that like might actually just be a webpage, but maybe as a user that's disguised for you, or, or you may, you know, how much time do you spend in something that calls itself a browser versus something that actually loads webpages that don't appear to be so.
So it's, it's, there's a lot more variance and it, from the user's point of view, even like it's, it's not always clear.
And then we, we have additional problems, which goes back to our previous conversation, which is like some of those environments where you're loading a webpage that doesn't look like a webpage may not allow you to interact with a challenge like a captcha.
And so maybe we, we need to get better at that as well to make the experience correct for the user.
Right. Well, and it's also interesting to think then about sites and applications that have experiences that span kind of desktop browser to mobile app experiences.
And they're trying to think about kind of how do they implement a consistent sort of strategy for, for, you know, bot protection across our portfolio.
When, when in truth that the goals may be the same, but the way that you achieve them need to be kind of vastly different from a technology and approach point of view.
Yeah, that's totally true. I mean, as a, as a Cloudflare customer, I just want it to work.
And as, as a, someone out there on the Internet, I just want it to work.
I don't want to know, have to know like all these little distinctions.
It's, it's, it's our job as people building the product is to, is to solve all that and make it kind of disappear, but it's complicated.
Yeah. Kind of the consistent theme here is, you know, Cloudflare, Cloudflare engineering is fighting the hard problem so that customers and end users don't have to.
And you have that, that sort of simplicity and that ease of use.
Yeah. As engineers, we might make a distinction between something that's a webpage and something that's calling an API.
When you're using the app, you know, it's, there's no distinction and, and for our customers, they obviously want to protect their, all their presence from these same threats.
Yeah. So I'm going to leave us on a little bit of a cliffhanger situation here on kind of, where do we go with, with mobile bots?
I think we're going to have to have you guys both on and dig in, spend that, spend a session specifically looking at mobile bots because we've actually run up against time.
As is always the case when I'm talking to you all, I could talk to you forever.
So really appreciate the time, really appreciate you, you talking with us about where we are with bots and where we're going.
And I look forward to talking to you all again soon. Thank you again for all of, all of you watching here on latest from product and engineering, and we will see you next time.