OpenAI desperate to avoid explaining why it deleted pirated book datasets

OpenAI desperate to avoid explaining why it deleted pirated book datasets

OpenAI is under increasing pressure to clarify its decision to remove two contentious datasets that comprised pirated books, a move that could significantly impact a class-action lawsuit brought by authors claiming their works were used without permission to train ChatGPT. The datasets, referred to as "Books 1" and "Books 2," were deleted before the launch of ChatGPT in 2022. They were originally compiled in 2021 by former OpenAI staff, using data sourced from the open internet, primarily from a controversial digital library known as Library Genesis (LibGen). OpenAI has stated that these datasets became obsolete in the same year they were created, leading to their subsequent removal. However, authors involved in the lawsuit are skeptical of this explanation. They highlight inconsistencies in OpenAI's narrative, noting that the company appeared to retract its claim regarding the datasets' “non-use” as a justification for deletion. This shift occurred after a court allowed the authors to examine OpenAI's internal communications concerning the “non-use” of these datasets. The situation escalated last week when U.S. District Judge Ona Wang mandated that OpenAI provide all correspondence with its legal team regarding the deletion of the datasets, as well as any internal discussions about LibGen that were previously withheld under claims of attorney-client privilege. According to Judge Wang, OpenAI's position that “non-use” was not a reason for deletion contradicted their assertion that it should remain a privileged matter. As the legal proceedings unfold, the authors are keenly awaiting insights into the internal discussions at OpenAI, which could shed light on the true reasons behind the deletion of these datasets.

Sources : Ars Technica

Published On : Dec 01, 2025, 22:25

Startups
How Sweden's Long-Term Vision is Attracting Tech Talent from Silicon Valley

In the battle for tech talent, European companies often find themselves at a disadvantage, but one Swedish startup is ch...

Business Insider | Mar 13, 2026, 12:45
How Sweden's Long-Term Vision is Attracting Tech Talent from Silicon Valley
Automotive
Motional's Autonomous Ioniq 5 Joins Uber's Robotaxi Fleet in Las Vegas

Uber has expanded its robotaxi services by incorporating autonomous vehicles from Motional, a company backed by Hyundai....

TechCrunch | Mar 13, 2026, 13:30
Motional's Autonomous Ioniq 5 Joins Uber's Robotaxi Fleet in Las Vegas
Gaming
FBI Launches Probe into Malware-Infested Games on Steam

The FBI has initiated an investigation into a hacker believed to have released multiple video games embedded with malwar...

TechCrunch | Mar 13, 2026, 15:10
FBI Launches Probe into Malware-Infested Games on Steam
AI
Elon Musk Revives Talent Search Amid xAI Leadership Exodus

In a bid to strengthen his AI startup xAI, Elon Musk has announced plans to revisit previous job applications as he face...

Business Insider | Mar 13, 2026, 08:40
Elon Musk Revives Talent Search Amid xAI Leadership Exodus
AI
Mastering AI in Coding: Insights from an Amazon Tech Lead

In the rapidly evolving world of technology, understanding the nuances of coding remains crucial, especially when harnes...

Business Insider | Mar 13, 2026, 07:10
Mastering AI in Coding: Insights from an Amazon Tech Lead
View All News