AI Firms Accused of Ignoring Robots.txt and Scraping Content

AI Companies Bypass Rules to Scrape Content, Spark Disputes with Publishers

Several AI companies are ignoring the Robots Exclusion Protocol (robots.txt) to grab content from websites without permission, says TollBit, a content licensing startup. This has caused conflicts between these AI firms and publishers. For instance, Forbes accused the AI company Perplexity of copying its content.

Contents

AI Companies Bypass Rules to Scrape Content, Spark Disputes with Publishers What is Robots.txt?Problem with AI Companies How This Affects Publishers What Publishers Are Doing TollBit’s Role Conclusion

What is Robots.txt?

The robots.txt protocol, created in the mid-1990s, is designed to prevent web crawlers from overloading websites. It's like a "Do Not Enter" sign for parts of a website. Although it's not legally binding, it has usually been respected until now. Publishers rely on this protocol to stop unauthorized content use by AI systems that scrape data to train algorithms and create summaries.

Problem with AI Companies

TollBit points out that many AI agents ignore robots.txt, retrieving content from sites against the rules. Their analytics show that various AI firms are using data for training without getting permission first. Perplexity, for example, has been accused by Forbes of using its investigative stories in AI-generated summaries without giving credit or asking for permission. Perplexity didn't comment on these claims.

How This Affects Publishers

AI-generated news summaries are becoming more popular, which worries publishers even more. Google's AI products create summaries for search queries, and this has escalated concerns. Publishers used robots.txt to block Google’s AI, but this also pulls their content from search results, hurting their online visibility. So, if AI ignores robots.txt, publishers wonder why they should use it and lose web traffic as well.

What Publishers Are Doing

Some publishers, like the New York Times, have sued AI companies for copyright infringement. Others prefer to negotiate licensing deals. The debate continues on how legal and valuable it is for AI to use content for training. Many AI developers argue that accessing content for free isn't breaking any laws unless it’s paid content.

TollBit’s Role

TollBit is also involved in this issue, acting as a middleman between AI companies and publishers. They help establish licensing agreements for content usage. The startup tracks AI traffic on publisher websites and provides analytics to negotiate fees for different content types, including premium content. As of May, TollBit says over 50 websites use its services, although they didn’t list the names.

Conclusion

Content scraping by AI firms without respecting robots.txt protocols is creating friction between AI companies and content publishers. With AI technology advancing, the need for clear rules and fair agreements is more important than ever to protect the rights of content creators.

Bottom Line: More transparent and mutually agreed-upon solutions are essential for the balance between AI innovation and publishers' rights.

Top Stories

Innovative At-Home Cervical Cancer Test Enhances Accessibility for Women

New ETF Expands Retail Access to Private Credit Boom

Berkshire Hathaway Exits BYD Stake as Focus Shifts to Japanese Holdings

Stay Connected

AI Firms Accused of Ignoring Robots.txt and Scraping Content

AI Companies Bypass Rules to Scrape Content, Spark Disputes with Publishers

What is Robots.txt?

Problem with AI Companies

How This Affects Publishers

What Publishers Are Doing

TollBit’s Role

Conclusion

Related Stories

Amazon MGM Studios Signs Exclusive Deal with Gaz Alazraki’s Maquina Vega

AI Firm Considers Banning Political Images for 2024 Elections

Indian Investors Embrace Global Debt for Diversification

New Dating App Aims to Connect Adults Over 40 in Iowa

AI in Cybersecurity: Can It Think Critically?

Franklin Templeton Embraces Solana for On-Chain Securities

Beyond Meat’s Q2 2024 Financials: Key Insights

Digits Accelerates FP&A with Basis Finance Acquisition

Quick Links

About US