AI & law

Federal Judge Affirms: OpenAI Must Hand Over 20 Million ChatGPT Logs to NYT-Led Plaintiffs


Read · 2 min

Federal Judge Affirms: OpenAI Must Hand Over 20 Million ChatGPT Logs to NYT-Led Plaintiffs

The single largest discovery order in AI copyright litigation has been affirmed. US District Judge Sidney H. Stein of the Southern District of New York rejected OpenAI's appeal against an earlier magistrate-judge order requiring the company to produce 20 million anonymised ChatGPT conversation logs to plaintiffs in the consolidated multidistrict litigation led by the New York Times.

What was ordered

Twenty million logs, anonymised, representing 0.5% of OpenAI's preserved conversation data. Plaintiffs originally sought 120 million logs; OpenAI counter-offered 20 million, which plaintiffs accepted. OpenAI subsequently appealed the magistrate's order on the grounds that it "insufficiently weighed privacy concerns" and relied on a securities case involving illegal wiretaps with stronger privacy protections. Judge Stein was unpersuaded.

Stein distinguished the wiretap analogy directly. "Unlike that case," he wrote, "ChatGPT's legal ownership of the logs, however, is uncontested, and users voluntarily submitted their communications." That sentence may well prove the load-bearing legal articulation of the year for AI copyright cases.

Who is suing

The MDL consolidates 16 copyright suits, with the New York Times Company as lead plaintiff. Other plaintiffs include the Chicago Tribune Company and three additional news organisations, plus parallel author-led class actions originally filed in California. The cases were consolidated by the US Judicial Panel on Multidistrict Litigation in April 2025 to handle pretrial proceedings together.

What the plaintiffs are looking for

Evidence that OpenAI's training data and runtime outputs reproduce, in commercially substantial form, content from plaintiff publications without licence. Discovery scope this large allows plaintiffs to search systematically for verbatim passages, near-paraphrases and summarisations that arguably substitute for the original. The 20 million logs are anonymised, but the content of the model's outputs is what matters for infringement analysis, not the user identities.

Why this is bigger than NYT v. OpenAI

Three reasons. First, the precedent on discovery scope: AI defendants will struggle to argue against multi-million-log production after this. Second, the privacy framing: Stein's distinction between wiretap-protected communications and voluntarily submitted user inputs is portable across the entire generative-AI industry. Third, the Anthropic comparison: Anthropic settled its own author class action for $1.5 billion in late 2025 — one of the largest copyright settlements in US history. If OpenAI does not follow suit, the discovery road ahead is going to be expensive.

What it means for European publishers

Most directly: the European publisher coalitions weighing whether to sue under the InfoSoc Directive or under member-state copyright frameworks now have a concrete US precedent on discoverability. Indirectly: the negotiating leverage on AI training-data licensing has shifted toward rights-holders. Several European publishers, including some with Luxembourg-based holding structures, are in active commercial discussions with frontier-model labs in 2026; that is no coincidence.

Who is the lead plaintiff?
The New York Times Company, leading a consolidated MDL of 16 copyright suits.
Can OpenAI still appeal?
Higher-court appeal options remain narrow given the procedural posture; substantive litigation continues.
What is the comparison case?
Anthropic's $1.5 billion settlement with authors in 2025 — one of the largest US copyright settlements ever.

See more on: Openai, Litigation, Ai, Copyright

navigateopenescclose