Skip to main content

Reddit escalates its fight against AI bots

Reddit escalates its fight against AI bots

/

With AI eating the public web, Reddit is going on the offensive against data scraping. 

Share this story

If you buy something from a Verge link, Vox Media may earn a commission. See our ethics statement.

Reddit’s logo
Illustration by William Joel / The Verge

In the coming weeks, Reddit will start blocking most automated bots from accessing its public data. You’ll need to make a licensing deal, like Google and OpenAI have done, to use Reddit content for model training and other commercial purposes. 

While this has technically been Reddit’s policy already, the company is now enforcing it by updating its robots.txt file, a core part of the web that dictates how web crawlers are allowed to access a site. “It’s a signal to those who don’t have an agreement with us that they shouldn’t be accessing Reddit data,” the company’s chief legal officer, Ben Lee, tells me. “It’s also a signal to bad actors that the word ‘allow’ in robots.txt doesn’t mean, and has never meant, that they can use the data however they want.”

My colleague David Pierce recently called robots.txt “the text file that runs the internet.” Since it was conceptualized in the early days of the web, the file has primarily governed whether search engines like Google can crawl a website to index it for results. For the last 20 years or so, the give-and-take — Google sending traffic in exchange for the ability to crawl — mostly made sense for everyone involved. Then, AI companies started ingesting all the data they could find online to train their models. 

Start your Command Line free trial now to continue reading

This story is exclusively for subscribers of Command Line, our newsletter about the tech industry’s inside conversation. Subscribe to a plan below for full access.

Already a Command Line subscriber?Sign in

We accept credit card, Apple Pay, and Google Pay. Having issues?Click here for FAQ