Two new ways to get better visibility and control of AI crawlers
Presented by: Martin Sanchez, Will Allen, David Belson
Originally aired on August 28 @ 12:00 PM - 12:30 PM EDT
Welcome to Cloudflare AI Week 2025!
There's barely a company or a startup not focused on AI right now. Companies' entire strategies are shifting because of this incredible technology.
From August 25 to 29, Cloudflare is hosting AI Week, dedicated to empowering every organization to innovate with AI without compromising security.
Tune in all week for more news, announcements, and thought-provoking discussions!
Read the blog posts:
- The next step for content creators in working with AI bots: Introducing AI Crawl Control
- A deeper look at AI crawlers: breaking down traffic by purpose and industry
Visit the AI Week Hub for every announcement and CFTV episode — check back all week for more!
English
AI Week
Transcript (Beta)
Hey, everyone. Thanks for joining us for the latest in this parade of awesome announcements we've got coming out around Cloudflare's AI Week.
I'm Martin Sanchez. I'm a member of the product marketing team here, and I'm really, really excited to be joined by David Belson, who leads our radar team, and Will Allen, who's one of our product leaders, to talk about a couple of those announcements.
Just to start out, the overall theme for these announcements is around AI bots and AI crawlers and the impact they have on an organization's website.
So, David, you spend a lot of time looking at traffic patterns on the Internet.
Just to start us off, could you tell us a little bit about some of the high-level trends that you've been seeing and that some of our customers have been seeing recently around AI crawling and AI bots?
Sure. Yep. And we've been tracking the AI crawler trends for almost a year now.
I think we launched it initially during birthday week in 2024. I think looking back over the last few weeks, just over that time period, we see a lot of traffic from GPT bot, so from OpenAI and CloudBot.
In fact, looking at radar over the last four weeks, it looks like they're almost neck and neck when you sort of summarize their traffic volume.
So, you've got Meta and Amazon and ByteDance's ByteSpider also accounting for a significant amount of traffic or crawling traffic.
The other thing we've been tracking over the last few months as well has been what we call the crawl-to -refer ratio.
So, Matthew, in a lot of his conversations, talks about how it used to be that Google would send you a...
They would crawl your page twice effectively for every visitor they sent you.
And then with the AI crawlers and the AI platforms, that ratio has gotten significantly out of whack.
We see now with some of the providers in the tens of thousands to one, so they're crawling your page thousands or tens of thousands of times before they refer a visit back to you.
Or some of the other AI crawlers are in the hundreds range.
That's in contrast to a lot of the search platforms, which we're seeing in more of the tens or single digits.
So, they're crawling your site... They're sending much more traffic for the amount of crawling that you're doing effectively.
Okay. So, I mean, I don't think Klautho is the first person to point this out, but certainly we have some pretty incredible visibility into some of those trends as we've seen.
Yep. Absolutely. So, that's a lot of AI crawlers that are consistently crawling various, pretty much every organization's content for things like trading or retrieval, augmented generation, things like that.
Will, I guess, what are you hearing from our customers? Why do they care, or why should people listening care about this?
What can they do about it? We've started from a very simple philosophy, and that's that anyone who puts their content on the web, on the website, so a news organization, a content creator, a publisher, an e-commerce site, you should get to decide how your content is being used for commercial purposes by others.
Some folks, when they put their content online, they want to write for the super intelligence.
They want their content to be fully ingested by every AI company and sort of fully in the training data that's out there, and that's great.
Some folks want to write for other humans and have it be much more limited and sort of restrict all the other bots, and that's great as well.
The key aspect there, we think, is control, and the ability to decide, you as the content creator who puts your stuff out in the world, you should decide and be in the driver's seat to make that decision.
The thing that I hear and we hear consistently, overwhelmingly from our customers on both sides of that equation, reinforces that belief, that they just want to have that ability to control access to their content.
They want to sort of publish amazing, high-quality content, again, whether you're a local news organization, a traditional publisher, sort of a new interesting publisher, a UGC site, and you be back in the driver's seat to decide what to happen to that content.
So many of the products that we're launching, the features that we think about, the data that we analyze, comes through that lens of enabling and empowering that level of control, and that control always starts with, first and foremost, the auditing, the understanding, sort of getting the lay of the land of what's actually going on with your content.
Awesome. Thank you. Well, I think that brings us neatly to the two announcements that we're going to talk about, because I think, David, you and the Radar team have been doing work to help people get more of that visibility.
So can you just tell us a little bit about what you're announcing this week?
Yeah, absolutely.
So a couple of major announcements this week. One is, Will and his team have done a great job of rolling out AI audit insights in Cloudflare's Dash.
So for my content, I can go in there and I can look at it and say, okay, here's what OpenAI is doing, here's what Complexity is doing, here's whatever, and see how those sites or those platforms are crawling my content.
But what's important to me as a content owner also is, how does what I'm seeing compare to my cohort of peers, my industry?
So one of the things we're rolling out is an industry perspective on the crawling activity.
So if you are in the computer and electronics industry, if you're in the healthcare industry, if you're in finance, you'll be able to go to the AI Insights page on Radar, looking at the traffic graph, and select an industry set from the dropdown.
And then that will then update all the appropriate graphs and the crawl to refer ratio table on Radar to reflect the calculations for that industry.
And then what we're also doing or announcing this week, Martin, is something based, not based on what you said, but you touched on earlier, with regards to crawl purpose.
So I think, you know, initially, the training aspect of the AI bot crawling really got all the headlines, because it's very aggressive, too.
You know, that's where people are seeing most of the traffic from.
But in addition to training, there's also search. So now we're starting to see these platforms where Will goes in and says, Hey, where do the cheapest flights from San Francisco to New York in mid November, and then the AI bot, the chat bot will go out and search for particular information, or what was the latest thing that happened in, you know, Seattle or whatever.
And then so we've got training search, and then the third is user action.
So that is you are asking the chat bot to do something specific, it goes out and accesses a site in response to your request.
So what we're basically rolling out in short is the ability to filter the graphs on the AI Insight page by crawl purpose, and also by industry set.
So you can do those either separately, or you can combine them and say, Hey, what does search crawling traffic look like for the health vertical, excuse me, the health industry, or so on.
So and then we'll also bring all of that in a regular fashion to the data explorer as well.
So if you really want to drill down on a specific sub industry, you can do that.
Awesome. I mean, that's really cool to hear. And I mean, I'm excited just to play with this personally, but also, I've been poking at the dev versions, and it's very cool.
Yeah, but I can really see how you just with those kind of few additional categories, you can start making some really granular, getting some really granular insights and making some really granular.
Yeah, it's really interesting to see how the top crawlers change from industry to industry.
Really, really very interesting to see how the crawler referral ratio shift also between industry to industry.
Yeah. Okay, awesome. Yeah. Well, I think I'm probably not the only person who's really excited to dig into that.
But okay, so that's kind of covering the visibility side of things. But then obviously, with all those insights, you've got to start making some decisions.
And then you need the ability to kind of enforce those decisions, and even the ability to make sure that, you know, as you enforce some of those decisions that, you know, your counterparts on the AI crawler side are actually going to be, you know, hearing the message you want them to.
So, you know, Will, I think this is when we turn to you, to hear what this other announcement that you've got, you've been cooking.
Yeah, when I talk to content creators and publishers and news organizations, I sort of say you have, you know, kind of three steps of the process to go through here, when you're trying to decide what your strategy is and what to do.
And it's audit, define, enforce, right? So the audit side is sort of David sort of captured is, what's happening to my particular content, who's coming to my content, what agents are coming, what crawlers are coming, what bots are coming, just sort of getting the lay of the land, how does that compare to the industry standards and industry benchmarks that you're seeing at an aggregate level that's available on radar.
So that's the first aspect, then take that information. And you need to decide for yourself as a news organization or a publisher, what to do with it, that's the defined side.
And that's, again, that's sort of really up to you, you're in the driver's seat, if you want to write for the super intelligence, or if you want to write for human eyes only, both are fine, but you make those decisions.
Once you've made those decisions, though, then you need to go to the last step, which is the enforce side.
And that's where all of our tools sort of come into play.
We're doing a couple things this week that I'm excited about. One is we are renaming AI audit.
audits are, it's an important part of tax season and a lot of other things.
But the section does so much more, we're renaming it sort of crawl control, AI crawl control.
And the idea being there that it goes beyond just the the information that's happening to the ability to sort of really have much more granular control over which crawlers and which agents are coming to interact with your properties.
So first and foremost, rename it to AI crawl control, long overdue rebranding, but I'm excited about that one.
The second one is you're sort of thinking about this enforce method, how do you go beyond sort of this binary choice that a lot of content creators and websites have today, which is either I block outright this particular crawler or this particular bot, or I allow them through for free.
And those are the only two options that you have there sort of block or allow.
We think it's too limited. What we are hearing from our customers, again, on sort of both sides of that both sort of the agent developers and AI companies, and the content creators and the publishers, they want to make it much more dynamic, they don't want to block forever.
And they also don't want to let them through for free.
They actually want to a transactional relationship, they want to sort of build a business together.
So we're introducing the ability to send customizable 402 messages.
And what this does is when a particular crawler comes to your site, instead of just saying no, you're blocked, or yes, you're allowed, you know, for free, you can send them a personalized message that says it's a 402 HTTP response code that says, hey, you can get this content, but you have to call me first, right?
Call our partnerships team, call our BD team, call our sales team, call the CEO, let's actually talk about terms, let's make a deal here.
It's not a no, it's a maybe, or it's a yes, if.
And we think that transition from a no to a sort of a maybe or yes, if is a really important part of this, allowing people to sort of move beyond this binary option to a much more sort of robust, scalable method of interacting at global scale across the board.
So any customer, as a paying cloud for customer using AI, crawl control this week can go in and set a customizable 402 message with the, you know, your email address or the email address of your partnerships team or their phone number, whatever you want to do that sends a particular agent or bots or crawler that message.
So then when they try to access your content, they can know that they can call you directly.
Already, we're seeing an enormous amount of sort of feedback from people whose, you know, websites are protected by Cloudflare, wanting to sort of move beyond this binary option to this new marketplace.
Every day, hundreds of millions of 402 responses are being sent out by Cloudflare customers, hundreds of millions.
And what's that saying is the same message.
It's not that, no, don't access my content ever. It's, hey, there's a deal to be had here.
There is a way that we can sort of build a marketplace together.
Let's actually get together and collaborate, find the right way to make this mechanism work.
So we're really excited to extend this out and expand it and give a lot much more granular control and customizable options for your responses that are out there.
Awesome. Well, thank you. I mean, that is super cool.
And I guess, you know, totally makes sense. So I guess, you know, farewell AI audit.
Hello, AI crawl control. Definitely a more fitting name. And yeah, you know, I think just in the same way that, you know, the beginning of any AI crawler strategy starts with visibility, understanding what's happening.
It also totally makes sense that it's, you know, that in this kind of evolving ecosystem we find ourselves in, it's not just a question of allow or block.
You know, there needs to be more granular options. And then that often starts with communication to have, you know, these different parties not see each other as enemies, but just as like partners in this kind of new version of the Internet that's evolving.
So that's really cool. Seems like a really impactful step already.
Okay, well, thank you so much for sharing those announcements, both of you.
For everyone listening, you know, this is absolutely not the end of AI Week. We've got a ton of other stuff coming out.
So stay tuned, keep an eye on the Cloudflare blog, you know, check us out on social media.
And then, you know, keep tuning into Cloudflare TV as well to hear directly from the people who are driving all of this work.
But yeah, thanks so much for joining us. David and Will, great to talk to you as always.
And everyone else, keep your eyes peeled. More good stuff coming.
Thanks, Martin.