AI-powered search startup Perplexity is under fire following serious accusations from Cloudflare, a major web infrastructure firm. In a recent blog post, Cloudflare claimed that Perplexity’s systems bypassed explicit site restrictions in order to extract data, a practice that has intensified debate over the ethics of AI web crawling.
Allegations of Unauthorized Crawling
The controversy comes just days after Perplexity launched its new Comet browser, a tool designed for AI-driven, agent-based web exploration. Shortly after this release, Cloudflare published an investigation alleging that Perplexity had ignored standard protocols—such as robots.txt restrictions—that website owners use to block bots from accessing content.
To test its claims, Cloudflare created test domains and denied access to automated bots. They then queried Perplexity’s platform for information on these domains and found that answers were still being generated, suggesting that the system had somehow retrieved the blocked data.
According to Cloudflare, this behavior points to the use of “user agent spoofing”, a method in which bots disguise themselves as regular browsers. This tactic allows them to slip past restrictions meant to limit automated crawling. The company said that this pattern of behavior appeared across tens of thousands of domains and included millions of requests daily, detected using machine learning models and network activity signals.
Perplexity Pushes Back
Perplexity, however, has firmly denied the allegations. Spokesperson Jesse Dwyer dismissed Cloudflare’s post as a publicity move, stating that it lacked solid proof. “The screenshots in the blog show that no content was actually accessed,” Dwyer said. She also added that the crawler named in the report did not belong to Perplexity.
While the startup maintains its innocence, the debate adds to the growing scrutiny over how AI platforms gather data across the internet.
The Larger Battle Over AI and Web Content
At the heart of this dispute lies a broader issue confronting the AI industry: how artificial intelligence tools harvest online content to train and generate responses. Leading models from companies like OpenAI, Anthropic, and Perplexity rely on vast datasets—ranging from blog posts to discussion forums—to improve accuracy and relevance.
However, this practice has increasingly come under question. In response to mounting criticism, some firms have implemented opt-out systems, allowing websites to prevent their content from being used by AI scrapers. But critics argue that these measures are not always honored, particularly when tools operate through evasive methods.
Cloudflare’s post reflects growing frustration in the digital ecosystem. The company contends that unregulated data scraping by AI firms could jeopardize the internet’s economic structure. “What some AI companies are doing is effectively breaking the business model of the internet,” Cloudflare wrote, vowing to clamp down on such unauthorized activity.