AI Firms Accused of Ignoring Robots.txt and Scraping Content

AI Companies Bypass Rules to Scrape Content, Spark Disputes with Publishers

Several AI companies are ignoring the Robots Exclusion Protocol (robots.txt) to grab content from websites without permission, says TollBit, a content licensing startup. This has caused conflicts between these AI firms and publishers. For instance, Forbes accused the AI company Perplexity of copying its content.

Contents

AI Companies Bypass Rules to Scrape Content, Spark Disputes with Publishers What is Robots.txt?Problem with AI Companies How This Affects Publishers What Publishers Are Doing TollBit’s Role Conclusion

What is Robots.txt?

The robots.txt protocol, created in the mid-1990s, is designed to prevent web crawlers from overloading websites. It's like a "Do Not Enter" sign for parts of a website. Although it's not legally binding, it has usually been respected until now. Publishers rely on this protocol to stop unauthorized content use by AI systems that scrape data to train algorithms and create summaries.

Problem with AI Companies

TollBit points out that many AI agents ignore robots.txt, retrieving content from sites against the rules. Their analytics show that various AI firms are using data for training without getting permission first. Perplexity, for example, has been accused by Forbes of using its investigative stories in AI-generated summaries without giving credit or asking for permission. Perplexity didn't comment on these claims.

How This Affects Publishers

AI-generated news summaries are becoming more popular, which worries publishers even more. Google's AI products create summaries for search queries, and this has escalated concerns. Publishers used robots.txt to block Google’s AI, but this also pulls their content from search results, hurting their online visibility. So, if AI ignores robots.txt, publishers wonder why they should use it and lose web traffic as well.

What Publishers Are Doing

Some publishers, like the New York Times, have sued AI companies for copyright infringement. Others prefer to negotiate licensing deals. The debate continues on how legal and valuable it is for AI to use content for training. Many AI developers argue that accessing content for free isn't breaking any laws unless it’s paid content.

TollBit’s Role

TollBit is also involved in this issue, acting as a middleman between AI companies and publishers. They help establish licensing agreements for content usage. The startup tracks AI traffic on publisher websites and provides analytics to negotiate fees for different content types, including premium content. As of May, TollBit says over 50 websites use its services, although they didn’t list the names.

Conclusion

Content scraping by AI firms without respecting robots.txt protocols is creating friction between AI companies and content publishers. With AI technology advancing, the need for clear rules and fair agreements is more important than ever to protect the rights of content creators.

Bottom Line: More transparent and mutually agreed-upon solutions are essential for the balance between AI innovation and publishers' rights.

Top Stories

YC Alum Adam Secures $4.1M to Advance Viral Text-to-3D AI Tool into Professional CAD Copilot

Reddit CEO: AI Chatbots Do Not Significantly Drive Platform Traffic

Reddit Q3 Earnings Surpass Expectations Amid Strong User Growth and Optimistic Outlook

Stay Connected

AI Firms Accused of Ignoring Robots.txt and Scraping Content

AI Companies Bypass Rules to Scrape Content, Spark Disputes with Publishers

What is Robots.txt?

Problem with AI Companies

How This Affects Publishers

What Publishers Are Doing

TollBit’s Role

Conclusion

Related Stories

Tommy Tuberville’s Stock Trades: Congressional Debate Sparked

3 Compelling Reasons to Invest in Gold this April

Rivian’s Pre-Owned EV Strategy Boosts Revenue

Crypto Exchanges Flocking to Turkey for Licences

Baidu Unveils New AI Models at Developer Conference

Bitcoin Price Prediction Amid $800B Giant’s BTC Trading

Apple iMessage and Microsoft Bing Escape EU Tech Regulation – A Victory for Big Tech

NVIDIA CEO Urges Governments to Embrace AI

Quick Links

About US