Reddit to Update Web Standard to Block Automated Data Scraping From Its Website

techquest February 12, 2025

Social media platform Reddit mentioned on Tuesday it can replace a Net normal utilized by the platform to dam automated knowledge scraping from its web site, following experiences that AI startups had been bypassing the rule to assemble content material for his or her techniques.

The transfer comes at a time when synthetic intelligence corporations have been accused of plagiarizing content material from publishers to create AI-generated summaries with out giving credit score or asking for permission.

Reddit mentioned that it could replace the Robots Exclusion Protocol, or “robots.txt,” a broadly accepted normal meant to find out which elements of a website are allowed to be crawled.

The corporate additionally mentioned it can preserve rate-limiting, a way used to manage the variety of requests from one specific entity, and can block unknown bots and crawlers from knowledge scraping – gathering and saving uncooked data – on its web site.

Extra just lately, robots.txt has develop into a key instrument that publishers make use of to stop tech corporations from utilizing their content material free-of-charge to coach AI algorithms and create summaries in response to some search queries.

Final week, a letter to publishers by the content material licensing startup TollBit mentioned that a number of AI corporations had been circumventing the net normal to scrape writer websites.

This follows a Wired investigation which discovered that AI search startup Perplexity doubtless bypassed efforts to dam its Net crawler by way of robots.txt.

Earlier in June, enterprise media writer Forbes accused Perplexity of plagiarizing its investigative tales to be used in generative AI techniques with out giving credit score.

Reddit mentioned on Tuesday that researchers and organizations such because the Web Archive will proceed to have entry to its content material for non-commercial use.

© Thomson Reuters 2024

Affiliate hyperlinks could also be routinely generated – see our ethics statement for particulars.

Source link