Skip to main content

This article is OpenAI training data

AI is holding the internet hostage — and media is no exception.

OpenAI Releases New Artificial Intelligence Model GPT-4o
OpenAI Releases New Artificial Intelligence Model GPT-4o
Photo illustration by VCG/VCG via Getty Images
Bryan Walsh
Bryan Walsh is an editorial director at Vox overseeing the climate, tech, and world teams, and is the editor of Vox’s Future Perfect section. He worked at Time magazine for 15 years as a foreign correspondent in Asia, a climate writer, and an international editor, and he wrote a book on existential risk.

You can’t read much about the risks of advanced AI without soon coming across the paperclip maximizer thought experiment.

First put forward by the Swedish philosopher Nick Bostrom in his 2003 paper “Ethical Issues in Advanced Artificial Intelligence,” the thought experiment goes like this: Imagine an artificial general intelligence (AGI), one essentially limitless in its power and its intelligence. This AGI is programmed by its creators with the goal of producing paperclips. (Why would someone program a powerful AI to create paperclips? Don’t worry about it — the absurdity is the point.)Because the AGI is superintelligent, it quickly learns how to make paperclips out of anything. And because the AGI is superintelligent, it can anticipate and foil any attempt to stop it — and will do so because its one directive is to make more paperclips. Should we attempt to turn the AGI off, it will fight back because it can’t make more paperclips if it is turned off — and it will win, because it is superintelligent.The final result? The entire galaxy, including you, me, and everyone we know, has either been destroyed or been transformed into paperclips. (As AI arch-doomer Eliezer Yudkowsky has written: “The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.”) End thought experiment.

The point of the paperclip maximizer experiment is twofold. One, we can expect AIs to be optimizers and maximizers. Given a goal, they will endeavor to find the optimal strategy to fulfill the maximal achievement of that goal, without worrying about the side effects (which in this case involve the galaxy being turned into paperclips).

Understanding AI and the companies that make it

Artificial intelligence is poised to change the world from media to medicine and beyond — and Future Perfect has been there to cover it.

Two, it’s therefore very important to carefully align the objectives of the AI with what we truly value (which in this case probably does not involve the galaxy more or less being transformed into paperclips). As ChatGPT told me when I asked about the thought experiment, “It underscores the need for ethical considerations and control measures in the development of advanced AI systems.”

Clever as the paperclip maximizer experiment is as an analogy for the problems of AI alignment, it’s always struck me as a little beside the point. Could you really create an AI so superintelligent that it can figure out how to turn every atom in existence into paperclips, but somehow also not smart enough to realize that such a result is not something we, its creators, would intend? There’s really nowhere in this hypothetical artificial brain that would stop somewhere along the way — perhaps after it had turned Jupiter into 2.29 x 1030 paperclips (thank you, ChatGPT, for the calculations) — and think, “Perhaps there are downsides to a universe composed only of paperclips”?Maybe. Or maybe not.

Let’s make a deal — or else

I’ve been thinking about the paperclip maximizer thought experiment ever since I found out on Thursday morning that Vox Media, the company to which Future Perfect and Vox belong, had signed a licensing deal with OpenAI to allow its published material to be used to train its AI models and be shared within ChatGPT.

The precise details of the deal — including how much Vox Media will be making for licensing its content, how often the deal can be renewed, and what kinds of protections might exist for specific kinds of content — are not yet fully clear. In a press release, Vox Media co-founder, CEO, and chair Jim Bankoff said that the deal “aligns with our goals of leveraging generative AI to innovate for our audience and customers, protect and grow the value of our work and intellectual property, and boost productivity and discoverability to elevate the talent and creativity of our exceptional journalists and creators.”

Vox Media is hardly alone in striking such a deal with OpenAI. The Atlantic announced a similar agreement the same day. (Check out Atlanticeditor Damon Beres’s great take on it.) Over the past several months, publishing companies representing more than 70 newspapers, websites, and magazines have licensed their content to OpenAI, including Wall Street Journalowner News Corp, Politico owner Axel Springer, and theFinancial Times.

The motivations for OpenAI in such agreements are clear. For one thing, it is in constant need of fresh training data for its large language models, and news websites like Vox happen to possess millions of professionally written, fact-checked, and copy-edited words (like these!). And as OpenAI works to ensure its chatbots can answer questions accurately, news articles are a more valuable source of up-to-date factual information than you’re likely to find on the web as a whole. (While I can’t say I’ve read every word Vox has ever published, I’m pretty sure you won’t find anything in our archives recommending that you add glue to keep cheese on pizza, as Google’s new generative AI search function Overview apparently did.)

Signing a licensing deal also protects OpenAI from the pesky threat of lawsuits from media companies that believe the AI startup has already been using their content to train its models (as has likely been the case). That’s precisely the argument being made by the New York Times, which in December sued OpenAI and its major funder Microsoft for copyright infringement. A number of other newspapers and news websites have launched similar lawsuits.

Vox Media chose to go a different route, and it’s not hard to see why. Should the company refuse to license its content, there’s a decent chance such data scraping would continue, without compensation. The route of litigation is long, expensive, and uncertain, and it presents a classic collective action problem: Unless the media industry as a whole banded together and refused to license its content, individual rebellions by individual companies will only mean so much. And journalists are a querulous lot — we couldn’t collude on something that big to save our lives, even if that’s precisely what it might do.

I’m not a media executive, but I’m pretty sure that on a balance sheet, getting something looks better than getting nothing — even if such a deal feels more like a hostage negotiation than a business one.

But while I’m not a media executive, I have been working in this business for more than 20 years. In that time, I’ve seen our industry pin our hopes on search engine optimization; on the pivot to video (and back again); on Facebook and social media traffic. I can remember Apple coming to my offices at Time magazine in 2010, promising us that the iPad would save the magazine business. (It did not.)

Each time, we are promised a fruitful collaboration with tech platforms that can benefit both sides. And each time, it ultimately doesn’t work out because the interests of those tech platforms do not align, and have never fully aligned, with those of the media. But sure — maybe this time Lucy won’t pull the football away.

Reporting on OpenAI

For Future Perfect specifically, there’s no getting around the fact that our parent company striking a deal with OpenAI to license all of our content presents certain optics problems. Over the past two weeks, Future Perfect reporters and editors led by Kelsey Piper and Sigal Samuel have published a series of investigative reports that cast serious doubts on the trustworthiness of OpenAI as a company and its CEO Sam Altman specifically. You should read them — as should anyone else thinking of signing a similar deal with the company.

Stories like that won’t change. I can promise you, our readers, that Vox Media’s agreement with OpenAI will have no effect on how we at Future Perfect or the rest of Vox report on the company. In the same way that we would never give favorable treatment to a company that is advertising on the Vox website, our coverage of OpenAI won’t change because of a licensing deal it signed with our parent company. That’s our pledge, and it’s one that everyone I work with here, both above and below me, takes very seriously.

That said, Future Perfect is a mission-driven section, one that was specifically created to write about subjects that truly matter for the world, to explore ways to do good better, to contribute ideas that can make the future a more perfect place. It’s why we are chiefly funded by philanthropic sources, rather than advertising or sponsorships. And I can’t say it feels good to know that every word we’ve written and will write for the foreseeable future will end up as training data, however tiny, for an AI company that has repeatedly shown, its mission statement aside, that it does not appear to be acting in the benefit of all humanity.

But my greater worries have less to do with what this deal and others like it mean for Future Perfect or even the media business more broadly, than what it means for the platform that both media companies and AI giants share: the internet. Which brings me back to maximizing paperclips.

Playing out the paperclip scenario

AIs aren’t the only maximizers; so are companies that make AIs.

From OpenAI to Microsoft to Google to Meta, companies in the AI business are engaged in a brutal race: for data, for compute power, for human talent, for market share, and, ultimately, for profits. Those goals are their paperclips, and what they are doing now, as hundreds of billions of dollars flow into the AI industry, is everything they can to maximize them.

The problem is that maximization, as the paperclip scenario shows, leaves very little room for anyone else. What these companies ultimately want to produce is the ultimate answer, AI products capable of responding to any question and fulfilling any task its users can imagine. Whether it’s Google’s AI Overview function aiming to eliminate the need to actually click on a link on the web — “let Google do the Googling for you,” as the motto went at the company’s recent developer event — or a souped-up ChatGPT with access to all the latest news, the desired end result is an all-knowing oracle. Question in, answer out — no pesky stops on writers or websites in between.

This is obviously not good for those of us who make our living writing on the web, or podcasting, or producing videos. As Jessica Lessin, the founder of the tech news site the Information, wrote recently, excoriating media companies signing deals with OpenAI: “It’s hard to see how any AI product built by a tech company would create meaningful new distribution and revenue for news.”

Already there are predictions that the growth of AI chatbots and generative AI search products like Google’s Overview could cause search engine traffic to publishers to fall by as much as 25 percent by 2026. And arguably the better these bots get, thanks in part to deals with media companies like this one, the faster that shift could happen.

Like I said, bad for us. But a world where AI increasingly acts as the one and only answer, as Judith Donath and Bruce Schneier recently wrote, is one that “threatens to destroy the complex online ecosystem that allows writers, artists and other creators to reach human audiences.” And if you can’t even connect to an audience with your content — let alone get paid for it — the imperative for producing more work dissolves. It won’t just be news — the endless web itself could stop growing.So, bad for all of us, including the AI companies. What happens if, while relentlessly trying to hoover up every possible bit of data that could be used to train their models, AI companies destroy the very reasons for humans to make more data? Surely they can foresee that possibility? Surely they wouldn’t be so single-minded as to destroy the raw material they depend on?

Yet just as the AI in Bostrom’s thought experiment relentlessly pursues its single goal, so do the AI companies of today. Until they’ve reduced the news, the web, and everyone who was once a part of it to little more than paperclips.