Business

SaaS

Technology

•

Feb 20, 2026

The Numbers Behind AI Crawling: What Cloudflare Radar Reveals About Who Takes and Who Gives Back

Anthropic’s ClaudeBot crawled 38,065 pages for every single referral visit it sent back to publishers in July 2025. Six months earlier, that ratio was 286,930:1. The July figure is an improvement. It is still the worst among major AI platforms by a wide margin.

Cloudflare Radar has turned AI crawler activity into a measurable problem — with named actors, per-platform data, and a public dashboard. The crawl-to-refer ratio is the metric that makes this visible. It tells you whether an AI platform is extracting value from the web or actually sending traffic back. Now there are numbers to back it up.

This article walks through what that data shows: what the ratio measures, how Cloudflare classifies crawler intent, which platforms are the worst offenders, and what the 400% growth in robots.txt bypass actually means in practice. For a broader look at what this means for your site, see our AI crawler governance strategy guide.

What is a crawl-to-refer ratio and why should you care about it?

The crawl-to-refer ratio tells you how many pages an AI platform crawls compared with how often it drives users back to your site. A ratio of 38,065:1 means ClaudeBot fetched 38,065 pages for every one visitor it referred back. Cloudflare calls this the “crawl-to-click gap” — same idea, slightly more informal framing.

Why does it matter? It’s the first metric that makes the economic exchange between AI companies and web publishers legible. Before this, publishers had a gut feeling that AI companies were scraping their content. Now you can compare platforms and track changes over time.

Traditional search crawlers give you a useful benchmark. Bingbot sits at approximately 40:1 — it crawls to build a search index, and Microsoft sends referral traffic back when users click results. When Anthropic’s training crawler sits at 38,065:1, that comparison makes the asymmetry pretty concrete.

Cloudflare is well placed to measure both sides of this equation. Its network proxies approximately 20% of all web traffic, so it sees both the crawler hitting a page and the referral click that may or may not follow. Despite rapid growth in AI crawler activity, AI platforms are still driving only about 1% of overall web traffic. That gap between how much they take and how little they send back is the story.

How does Cloudflare Radar classify AI crawler intent?

Cloudflare puts all AI crawler traffic into four buckets based on what the bot is actually doing with the content it fetches.

Training (~80% of AI bot traffic): Bots building training datasets for large language models. There’s zero structural incentive to send traffic back — they extract and store. This category is dominated by GPTBot (28.1% of AI-only bot traffic) and ClaudeBot (23.3%).

Search (~18%): Bots indexing content for AI-powered search results. There’s a stronger referral incentive here because the product depends on delivering results linked to sources. Includes OAI-SearchBot (2.2%) and PerplexityBot.

User action (~3%, grew 15x in 2025): Bots fetching content in real time in response to a user’s chatbot prompt. This has the highest referral incentive — the user may need to see where the answer came from. Also called agentic crawling. Includes ChatGPT-User (2.4%). For the deeper story on this category, see our piece on agentic AI browsing and the Search Explosion.

Undeclared: Crawlers that don’t identify their purpose. A growing compliance concern and one to watch.

The key insight is simple: crawl purpose determines crawl-to-refer ratio. Training crawlers have no mechanism to send traffic back. Search and user-action crawlers have structurally better ratios because referrals are part of what makes the product work.

Which AI platforms are taking the most and giving the least back?

Anthropic’s ClaudeBot is worst at 38,065:1 (July 2025). Perplexity is best among pure AI companies at approximately 195:1. OpenAI sits in between at approximately 1,091:1. And Microsoft’s Bingbot holds steady at approximately 41:1.

Here’s the breakdown by platform:

Anthropic — ClaudeBot Purpose: Training. Traffic share: 23.3%. Crawl-to-refer ratio: 38,065:1 (down from 286,930:1 in January 2025). robots.txt trend: improving, but WebBotAuth adoption is lagging.

OpenAI — GPTBot Purpose: Training. Traffic share: 28.1%. Crawl-to-refer ratio: ~887–1,091:1. robots.txt trend: strong; adopted WebBotAuth.

OpenAI — ChatGPT-User Purpose: User action. Traffic share: 2.4%. Crawl-to-refer ratio: lower than GPTBot because retrieval drives referrals. robots.txt trend: adopted WebBotAuth.

OpenAI — OAI-SearchBot Purpose: Search. Traffic share: 2.2%. Crawl-to-refer ratio: not separately disclosed. robots.txt trend: adopted WebBotAuth.

Perplexity — PerplexityBot Purpose: Search. Traffic share: 0.4%. Crawl-to-refer ratio: ~195:1 (worsening from 54.6:1 in January 2025). robots.txt trend: previously caught bypassing; policies since updated.

ByteDance — Bytespider Purpose: Training. Traffic share: 5.8% (down sharply from 37.3% in July 2024). Crawl-to-refer ratio: 0.9:1 (down from 18:1 in January 2025). robots.txt trend: sharp reduction in overall activity.

Meta — Meta-ExternalAgent Purpose: Mixed. Traffic share: 7.5% (up from 0.9% in July 2024). Crawl-to-refer ratio: not disclosed. robots.txt trend: single-purpose model; compliant.

Microsoft — Bingbot Purpose: Search + AI. Traffic share: stable. Crawl-to-refer ratio: ~40:1. robots.txt trend: stable.

A few things worth pulling out. Anthropic’s 86.7% improvement is large — and still leaves them last by a wide margin. ByteDance dropped from 37.3% to 5.8% of AI-only bot traffic in a single year; no reason has been disclosed. Meta grew from 0.9% to 7.5% in the same period.

Googlebot remains the largest single crawler across the combined AI and search bot landscape — but the relationship between Googlebot, Google AI Overviews, and referral traffic is its own story. Full treatment of the Google problem is in the next article in this series.

Why do crawl-to-refer ratios vary so dramatically between platforms?

The ratio reflects the structural incentives of each platform’s business model. Training-only crawlers have no product reason to send traffic back. Search and retrieval crawlers must refer traffic because that’s how their product works.

OpenAI’s three-crawler architecture illustrates the principle nicely. GPTBot handles training at approximately 1,091:1. OAI-SearchBot handles AI-powered search. ChatGPT-User handles real-time retrieval. Same company, three mandates, three ratio profiles. Cloudflare cites OpenAI as a positive compliance reference because the separation of purpose is explicit.

Anthropic tells the same story from the other direction. Before March 2025, ClaudeBot was training-only — no retrieval product, no mechanism to send visits back. When Anthropic launched Claude web search, it added citations with clickable URLs and the ratio dropped 86.7% in six months.

Perplexity is the most interesting case. Its ratio (~195:1) is the best among pure AI companies because its entire product is real-time retrieval. But the ratio has been worsening — from 54.6:1 in January to 195:1 in July. And Digiday quotes publishing executives describing its crawler as “one of the most badly-behaved.” A good ratio and good compliance are not the same thing.

The takeaway: you can predict a platform’s ratio from its business model. Retrieval products refer. Training-only products do not.

What does the 400% growth in robots.txt bypass actually mean?

Between Q2 and Q4 2025, AI bots ignoring robots.txt grew by 400%, according to TollBit‘s “State of the Bots” report. By Q4 2025, 1 in every 31 site visits came from an AI scraping bot — up from 1 in 200 in Q1 2025. In the same period, 336% more websites started trying to block AI bots.

robots.txt (formalised as RFC 9309) is how you tell bots which parts of your site to avoid. The catch is that compliance is entirely voluntary — there’s no technical mechanism that forces a bot to honour it. TollBit’s data shows more than 13% of AI bot requests were bypassing it in Q4 2025.

Connect this back to the taxonomy and the pattern is clear: the bots most likely to bypass are training bots — the same category with the worst crawl-to-refer ratios. The least-compliant bots are also extracting the most value and returning the least.

WebBotAuth is Cloudflare’s structural response to this problem. Rather than relying on user-agent strings (which any bot can fake), it uses cryptographic signatures to confirm a request actually comes from the declared crawler. OpenAI has adopted it. Anthropic had not as of August 2025.

For a practical breakdown of the tools available — from robots.txt through to IP blocking and bot management platforms — see our overview of tools available to publishers.

What does 48% non-human documentation traffic mean for your site?

Mintlify, a developer documentation platform, publicly reported that 48% of its documentation traffic is non-human. Nearly half of all page visits are bots, not developers.

This hits differently if your site is documentation-first. Documentation is product infrastructure — it’s the technical reference your customers use to integrate your API. When AI bots crawl it at scale, three things happen.

Analytics distortion: If half your visitors are bots, your page view data is lying to you about how developers actually use your docs.

Capacity costs: Server load from non-human traffic is real and it’s growing.

Content investment ROI: Some fraction of your documentation effort is serving AI training datasets and chatbot queries — not your actual customers.

The user action (agentic) category is the driver here. A developer asks ChatGPT how to use your API. ChatGPT-User fetches your documentation page and delivers the answer inside the chatbot. The developer never visits your site. The 15x growth in this category in 2025 means it’s getting more common, not less.

The deeper analysis of agentic crawling and its implications for developer tools is in our piece on agentic AI browsing and the Search Explosion.

Where do you find this data for your own site?

Cloudflare Radar’s AI Insights page publishes aggregate crawl-to-refer ratios, traffic share data by platform, crawl purpose breakdown, and trend lines going back through 2024–2025. It’s publicly available without a Cloudflare account.

The key distinction: Cloudflare Radar shows figures across its entire network. For your domain specifically, you need a Cloudflare account with bot analytics enabled. If you’re not on Cloudflare, your server logs can identify AI crawlers by user-agent string — GPTBot, ClaudeBot, PerplexityBot, and others declare their identity in HTTP headers when they comply with norms.

TollBit’s quarterly “State of the Bots” reports cover bypass rate and blocking statistics.

The data is public, it’s per-platform, and it’s getting worse. Bot policy is no longer something you can defer. For a practical framework on how to build a bot policy for your site, that’s where we go next.

Frequently Asked Questions

What is the difference between an AI crawler and a regular search engine bot?

A regular search engine bot (like Googlebot or Bingbot) crawls pages to index them for search results and sends referral traffic back when users click those results. AI crawlers fetch content for model training, AI-powered search, or real-time retrieval — and most send far less traffic back. Bingbot sits at approximately 40:1 in July 2025; ClaudeBot sits at 38,065:1.

How often does Cloudflare Radar update its AI crawler data?

Cloudflare Radar AI Insights provides near-real-time data with trend lines on monthly and quarterly time horizons.

Can I block specific AI crawlers from my website?

Yes — add Disallow directives in your robots.txt targeting specific user-agent strings (GPTBot, ClaudeBot, PerplexityBot, Bytespider, and others). But compliance is voluntary. TollBit data shows a 400% increase in bots bypassing robots.txt between Q2 and Q4 2025, with more than 13% of AI bot requests ignoring it in Q4. More robust options include IP-based blocking and Cloudflare’s bot management tools.

Why did Anthropic’s crawl-to-refer ratio improve so dramatically between January and July 2025?

Anthropic launched Claude web search in March 2025. Before that, ClaudeBot was training-only with no mechanism to refer visits back. Adding a retrieval product created citations with clickable URLs. The ratio improved from 286,930:1 to 38,065:1 — an 86.7% improvement that still leaves Anthropic in last place.

What does “user action” crawling mean in Cloudflare’s taxonomy?

User action (also called agentic crawling) is when an AI bot fetches web content in real time in response to a user prompt — for example, when a ChatGPT user asks a question and ChatGPT-User retrieves a page to help answer it. This category grew 15x in 2025 and accounts for approximately 3% of AI bot traffic.

Is Bingbot considered an AI crawler?

Bingbot indexes content for traditional Bing search results and feeds Microsoft’s AI features including Microsoft Copilot. Its crawl-to-refer ratio (~40:1 in July 2025) is significantly better than pure AI training crawlers because search clicks generate referral traffic.

What is WebBotAuth and how does it help with AI crawler identification?

WebBotAuth is Cloudflare’s cryptographic verification protocol for confirming the identity of AI bots. Unlike user-agent strings — which any bot can claim — it uses cryptographic signatures to verify that a request actually comes from the declared crawler. OpenAI adopted it; Anthropic had not as of August 2025.

How does zero-click search relate to AI crawling?

Zero-click search is when an AI feature (like Google AI Overviews or ChatGPT search) answers a query directly without generating a click to the source site. Content was crawled, the answer was served, no referral traffic returned. Google referrals to news sites fell 9% in March 2025 and 15% in April, coinciding with AI Overviews expansion.

What happened to ByteDance’s Bytespider crawler?

Bytespider’s share of AI-only bot traffic dropped from 37.3% in July 2024 to 5.8% in July 2025. Its crawl-to-refer ratio collapsed from 18:1 to 0.9:1 as activity fell. The specific reason has not been publicly disclosed by ByteDance.

Why does Perplexity have the best crawl-to-refer ratio despite past compliance issues?

Perplexity’s product is real-time web retrieval — PerplexityBot fetches pages and presents results with source links. Referral traffic is a natural byproduct, producing a better ratio (~195:1 versus Anthropic’s 38,065:1). But Perplexity’s ratio has been worsening — it was 54.6:1 in January 2025 — and Digiday quotes publishing executives describing its crawler as “one of the most badly-behaved.” A better ratio does not mean better compliance.

Where can I find Cloudflare Radar’s AI crawler traffic data?

The primary resource is radar.cloudflare.com/ai-insights — aggregate crawl-to-refer ratios, traffic share by crawler, and trend data available without a Cloudflare account. For per-domain data (your specific site), you need a Cloudflare account with bot analytics enabled. TollBit’s quarterly “State of the Bots” reports cover bypass rate and blocking statistics.

Sources: Cloudflare blog — “The crawl-to-click gap” (August 2025); Cloudflare Radar 2025 Year in Review (December 2025); Cloudflare theNET Year in Review (January 2026); WIRED — “AI Bots Are Now a Significant Source of Web Traffic” (February 2026); TollBit Q2 2025 State of the Bots report; Digiday — “In graphic detail: the state of AI referral traffic in 2025” (December 2025); InfoQ — Cloudflare 2025 AI Bots Report summary; Simon Willison — Cloudflare Radar AI Insights writeup (September 2025).

The Numbers Behind AI Crawling: What Cloudflare Radar Reveals About Who Takes and Who Gives Back

What is a crawl-to-refer ratio and why should you care about it?

How does Cloudflare Radar classify AI crawler intent?

Which AI platforms are taking the most and giving the least back?

Why do crawl-to-refer ratios vary so dramatically between platforms?

What does the 400% growth in robots.txt bypass actually mean?

What does 48% non-human documentation traffic mean for your site?

Where do you find this data for your own site?

Frequently Asked Questions

What is the difference between an AI crawler and a regular search engine bot?

How often does Cloudflare Radar update its AI crawler data?

Can I block specific AI crawlers from my website?

Why did Anthropic’s crawl-to-refer ratio improve so dramatically between January and July 2025?

What does “user action” crawling mean in Cloudflare’s taxonomy?

Is Bingbot considered an AI crawler?

What is WebBotAuth and how does it help with AI crawler identification?

How does zero-click search relate to AI crawling?

What happened to ByteDance’s Bytespider crawler?

Why does Perplexity have the best crawl-to-refer ratio despite past compliance issues?

Where can I find Cloudflare Radar’s AI crawler traffic data?

Related Articles

Web app development and your business strategy

MCP Apps Are Making News. Do you need one?

Metric of the moment – Sales Velocity, and how to use it to boost sales

Need a reliable team to help achieve your software goals?

BUSINESS HOURS

SYDNEY

YOGYAKARTA

BANDUNG