How OpenAI Can Court Journalists (Beyond Cash)

+ Get Your Columns Sorted

Jan 12, 2024

∙ Paid

Welcome to Part I of Edition No. 65 of my weekly newsletter, providing practical analysis in the world of digital content strategy.

Tip: Get Your (GA4) Columns Sorted

Analysis: How OpenAI Can Court News Outlets

Tip: Column Sorting in GA4

Dashboards in the “Reports” section of GA4 are, by default, sorted from greatest to least based on the primary metric.

But what if you want to sort from least to greatest? Or sort by another metric?

Let me show you how.

1. Click on any of the reports in your Reports tab. I have selected “Pages and screens.”

2. Click in the white space of any of the metrics column headers. It will now sort the column from greatest to least. (As indicated by the downward-pointing arrow.)

3. If the column is already sorted from greatest to least, click it again and it will sort from least to greatest. (As indicated by the upward-pointing arrow.)

(I’m glad no one’s wasting too much time reading the disclaimer or privacy policy on my website!)

That’s it! You can do this on any of the columns.

Did you find this tip useful? Share it to help spread the word.

(With Things Other Than Money)

(Though Money is Also Nice)

OpenAI recently struck a deal with AxelSpringer to license its content. In exchange for payment and exposure in ChatGPT, the publisher will allow the platform to train itself on the content.

Two weeks later, the New York Times sued OpenAI for copyright infringement.

“The lawsuit, filed in Federal District Court in Manhattan, contends that millions of articles published by The Times were used to train automated chatbots that now compete with the news outlet as a source of reliable information,” according to the article.

The Times is claiming “billions1 of dollars in statutory and actual damages” as a result of OpenAI’s “unlawful copying and use” of its work.

These are two solutions2 – on extreme ends of the spectrum – to a problem that’s new to this era of content creation: How can journalists and large language models (LLMs) coexist?

I have three recommendations for ChatGPT and its peers regarding how they could not only mend fences with journalists, but court them:

Issue Corrections
More Citing + Linking
Badges

But first, some flaws in the agreement OpenAI struck with AxelSpringer.

Many news outlets have blocked OpenAI and other LLMs from crawling their websites’ content. Unfortunately, the damage is done.

LLMs no longer have access to many news websites’ content, which hurts their “real-time” capabilities. They did, however, already make off with enough loot to benefit their model infinitely more than anything news outlets are receiving in return.

The Axel Springer agreement may seem like a win, but from the perspective of the industry at large, it’s flawed.

One-Time Fee, Forever Use

Money can only be spent once, but ChatGPT will benefit from the content it licenses from AxelSpringer in perpetuity.

Even if the “tens of millions” of Euros were a good deal for AxelSpringer, it’s hardly scalable as a solution-wide industry. A Spotify-like model of revenue share wouldn’t be appealing either. That is, I doubt OpenAI would be able to offer attractive pricing to more than a small percentage of the market, whose content would be most-often cited.

Not that there’s a clear way (at least from the outside) to tie financial value to a single citation to begin with.

Facebook and Google, in a roundabout way, tried something similar with Instant Articles and AMP. Both long ago fell out of favor with news outlets, as parent companies Meta and Alphabet, respectively, have mostly kicked journalists to the curb for greener (read: more valuable) pastures.

What proof do we have that OpenAI can do any better, especially after getting so off-on-the-wrong-foot with an entire industry? (Rhetorical question.)

Either AxelSpringer has been swindled (if that were possible for such a high price tag), or it’s the lone winner (so far) among a global industry that’s becoming more precarious with each technological advancement3.

The Times is arguably the most well-known news source in the world. It has resources to take on OpenAI that hundreds of other news organizations, especially local ones, can’t afford.

Of course, whatever concessions the Times may extract from its legal battle are likely to create a rising-tide/all-ships situation, so hats off to them for taking the lead.

Regardless of the outcome, there are features ChatGPT, Gemini and the like could add that would not only benefit newsrooms, but begin to restore trust.

There are many LLMs out there, but the most popular one is ChatGPT. While I’ll most often direct suggestions at OpenAI, these are meant for any AI-driven platform that wants to use news content to power its answers.

This isn’t an attack on ChatGPT, which I love and use daily, but rather a short list of ideas for how it could collaborate with the news industry.

I. Issue Corrections

ChatGPT retains your chat history unless you turn it off. (And even if you turn it off, who knows?) It has a record of what you have previously asked, and the answers it has provided.

Before you ever type an inquiry, these chatbots tell you they could get things wrong:

So what happens when its responses are inaccurate? I’m not talking about an Excel formula error. I’m talking about people asking for the latest updates on, say, the 2024 election.

(Don’t fret, OpenAI. Even your arch nemesis, the NY Times, makes mistakes! Every news outlet does. The key is to correct them as soon as possible.)

By implementing a notification bar, every time a user logs on, they could be notified of:

Any erroneous information that was previously provided
What the correct information is
What the source of the misinformation was (whether an LLM mistake or a faulty source)
What the source of the corrected information is
Assurance this correction will be part of the system going forward should other users ask about the same topic

By holding itself to the same standards as journalists, ChatGPT can earn their respect. If they can create technology capable of writing novels from scratch, surely they can implement a product – corrections – that’s as old as the newspaper business.

II. More Citing + Linking

While we’re focusing on news outlets, they’re not the only ones that should get cited and linked. Not only for their benefit, but users’, too.

It’s important to understand that ChatGPT by nature doesn’t simply regurgitate existing content. Without getting in over my head, my understanding of the artificial intelligence in LLMs is that they take their existing “knowledge” and iterate on it. (Which is why ingesting thousands of books, news articles, documents and other wide-ranging pieces of content is so useful to its model.)

So technically, LLMs can produce an “original” response based on unoriginal information. But that doesn’t mean it shouldn’t credit its “inspiration.”

Source Levels

Think of it this way: If someone asked you the result of a baseball game, you could tell them, “The Dodgers won 6-5 on a walk-off home run. It was amazing.” But how would you know?

Perhaps you were at the game and witnessed it firsthand. Or perhaps you watched it live on TV. Or listened on the radio. In all three of those cases, you would be considered a first-person (original) source because you saw it (or heard it) with your own eyes (ears).

Step back a layer, and you might watch replay highlights on TV or online. Or maybe you read about it in an article recap. Now you’re getting information from a first-person source – still pretty reliable – but that first-person source is not you. This would be the equivalent of the LLMs citing a firsthand source.

But what if you got your information from someone in that second layer? They didn’t see or hear the game live, but they got information about it from someone who did, and now you’re getting information from them.

Whatever they tell you about the game makes you a third-level source. If they didn’t witness the game live in some capacity, how can you trust that what they say is accurate? In the example of a baseball game, this isn’t tremendously important. But for other topics it is.

The more layers you are removed from the original event, the murkier (and more susceptible to mistakes) the details.

What if you were getting secondhand updates from someone about any of the wars going on around the globe. You would want to know where they got their information so you could verify and/or go deeper. But what if they don’t tell you how or where they heard the update? How much would you trust them?

That’s the situation we’re in when LLMs don’t tell us where they get their information.

So while there’s nothing wrong, per se, on being a third-level (or beyond) source, you – that is, ChatGPT, Gemini, et al – should still be able to tie your information back to an original one.

A Broader Solution

ChatGPT creates “original” content because it’s an AI-driven LLM? Great! But the root of that information came from somewhere, and users deserve to know the source.

As part of the Axel Springer agreement, ChatGPT is already doing exactly this.

“ChatGPT’s answers to user queries will include attribution and links to the full articles for transparency and further information.”

But that’s of little benefit to other news outlets whose content has already been pillaged.

Fox Corporation has taken a shot at leading the way on this very issue. It recently announced “Verify”:

“A protocol for media companies to register content and grant usage rights to AI platforms, while also allowing end consumers to verify the origin of content.”

According to the Axios Media Trends newsletter, “The company is also in active discussions with other media companies to use its protocol, suggesting that the tool will give them leverage in their negotiations with AI companies.”

OpenAI should partner with Fox – and anyone else – who wants to create a similar product. May the one that’s best for the news industry win.

III. Badges

When we think of badges, we think of verification. The OG in this arena was the “blue checkmark” from Twitter, which is now about as coveted as catching the flu. (At least the latter is still free.)

While LLMs don’t have users in the same sense that a social network does, we can still find utility in a verification-like feature.

How would this work?

When ChatGPT provides an answer based on information from a news outlet, it could go a step beyond citing and linking. It could also tack on a badge that shows the content is from a trusted source.

Ideally, this badge would indicate that OpenAI is in partnership with the news outlet. (The burden of how to scale this to thousands of news sites should fall on OpenAI, not the journalists.) The badge-holders could be centrally listed on official OpenAI documentation, along with standards for acceptance, with which I’m sure a diverse committee of journalists would be happy to assist.

This wouldn’t mean ChatGPT couldn’t cite the conspiracy theorists and basement bloggers, just that they just shouldn’t be trusted with badges. (Their edges are sharp, you know.)

These recommendations may not compensate for the slow trickle of traffic that news outlets are losing to more approachable, interactive tools like LLMs. They would, however, help them become part of the revolution instead of getting rolled by it.

Keep reading with a 7-day free trial

Subscribe to Gerick News(letter) to keep reading this post and get 7 days of free access to the full post archives.

Gerick News(letter)