Perplexity rejects Cloudflare claim of secret site scraping

AI-powered search startup Perplexity is under fire following serious accusations from Cloudflare, a major web infrastructure firm. In a recent blog post, Cloudflare claimed that Perplexity’s systems bypassed explicit site restrictions in order to extract data, a practice that has intensified debate over the ethics of AI web crawling.

Allegations of Unauthorized Crawling

The controversy comes just days after Perplexity launched its new Comet browser, a tool designed for AI-driven, agent-based web exploration. Shortly after this release, Cloudflare published an investigation alleging that Perplexity had ignored standard protocols—such as robots.txt restrictions—that website owners use to block bots from accessing content.

To test its claims, Cloudflare created test domains and denied access to automated bots. They then queried Perplexity’s platform for information on these domains and found that answers were still being generated, suggesting that the system had somehow retrieved the blocked data.

According to Cloudflare, this behavior points to the use of “user agent spoofing”, a method in which bots disguise themselves as regular browsers. This tactic allows them to slip past restrictions meant to limit automated crawling. The company said that this pattern of behavior appeared across tens of thousands of domains and included millions of requests daily, detected using machine learning models and network activity signals.

Perplexity Pushes Back

Perplexity, however, has firmly denied the allegations. Spokesperson Jesse Dwyer dismissed Cloudflare’s post as a publicity move, stating that it lacked solid proof. “The screenshots in the blog show that no content was actually accessed,” Dwyer said. She also added that the crawler named in the report did not belong to Perplexity.

While the startup maintains its innocence, the debate adds to the growing scrutiny over how AI platforms gather data across the internet.

The Larger Battle Over AI and Web Content

At the heart of this dispute lies a broader issue confronting the AI industry: how artificial intelligence tools harvest online content to train and generate responses. Leading models from companies like OpenAI, Anthropic, and Perplexity rely on vast datasets—ranging from blog posts to discussion forums—to improve accuracy and relevance.

However, this practice has increasingly come under question. In response to mounting criticism, some firms have implemented opt-out systems, allowing websites to prevent their content from being used by AI scrapers. But critics argue that these measures are not always honored, particularly when tools operate through evasive methods.

Cloudflare’s post reflects growing frustration in the digital ecosystem. The company contends that unregulated data scraping by AI firms could jeopardize the internet’s economic structure. “What some AI companies are doing is effectively breaking the business model of the internet,” Cloudflare wrote, vowing to clamp down on such unauthorized activity.

The Daily Star Lebanon

The Daily Star Lebanon Breaking News, Lebanon News, Middle East News & World News

Perplexity rejects Cloudflare claim of secret site scraping

Allegations of Unauthorized Crawling

Perplexity Pushes Back

The Larger Battle Over AI and Web Content

Check Also

Nvidia to Invest $5 Billion in Intel, Becoming Its Largest Shareholder

Leave a Reply Cancel reply

Macron takes risk with Palestinian statehood recognition

Major European Airports Hit by Cyber-Related Disruption

Trump imposes $100K fee on H-1B visas

Does Stake have a mobile version?

Barcelona vs Getafe Prediction, Preview, News, Odds: 21/9/2025

Iran Accuses Israel of ‘Mass Murder’ Following Pager Explosions

Russia Deems Gaza Situation Catastrophic, Calls on Hamas to Free Hostages

Renowned Lebanese Novelist Elias Khoury Dies at 76

Evidence Suggests World’s Oldest City Found in THIS Country, Not Mesopotamia

China paper warns Google may pay price for hacking claims