Here's the list of websites gig workers used to fine-tune Anthropic's AI models. Its contractor left it wide open.

A recently leaked internal document has shed light on the guidelines used by gig workers at Surge AI for refining Anthropic's artificial intelligence. This spreadsheet details which websites were authorized for use and which were excluded, aiming to enhance the AI's ability to communicate in a manner deemed more 'helpful, honest, and harmless.' Among the sources permitted for consultation are reputable institutions such as Bloomberg, Harvard University, and the New England Journal of Medicine. Conversely, major outlets like The New York Times and Reddit are explicitly banned. Anthropic has distanced itself from the document, stating that it was generated by Surge AI without the company's knowledge or involvement. An Anthropic representative emphasized, "We were unaware of its existence until today and cannot validate the contents of the specific document since we had no role in its creation." The practice of utilizing external websites to refine AI models is common in the industry, where companies often collaborate with data-labeling startups like Surge. Internal project documents reveal that Surge's work was focused on making Anthropic's AI more relatable while minimizing the risk of generating offensive content. Notably, many approved sources have copyright restrictions, and institutions such as the Mayo Clinic and Cornell University have confirmed that they lack agreements with Anthropic regarding the use of their content for AI training. Initially, the sensitive spreadsheet was accessible via Google Drive, but Surge restricted access shortly after inquiries were made about its contents. A spokesman for Surge commented, "We take data security seriously, and documents are restricted by project and access level where possible," highlighting their ongoing investigation into the incident. This event marks another instance of a data-labeling startup inadvertently exposing sensitive AI training materials. Surge's competitor, Scale AI, faced a similar situation, leading to a tightening of document security protocols. A spokesperson for Google Cloud clarified that sharing settings are typically restricted by default, and any changes are at the discretion of the customer. Surge reportedly achieved $1 billion in revenue last year and is currently raising funds at a valuation of $15 billion, while Anthropic's latest valuation stands at $61.5 billion. Their Claude chatbot is recognized as a formidable competitor in the AI landscape, particularly against ChatGPT. The spreadsheet, created in November 2024, serves as a comprehensive guide for gig workers detailing over 120 authorized sources across various domains, including academia, healthcare, law, and finance. It enumerates prestigious universities and respected medical journals, while also maintaining a blacklist of over 50 disallowed sources, primarily consisting of prominent media outlets. The reasons for the inclusion or exclusion of specific sources remain unclear. Legal experts suggest that the blacklist could reflect the responses of websites that have actively sought to restrict their content from being used by AI companies, either through direct requests or automated methods. Recent lawsuits have highlighted these tensions, as seen in Reddit's legal action against Anthropic for unauthorized access to its site. Surge contractors employed this list during a critical phase of AI model training, known as reinforcement learning from human feedback (RLHF). This process entails human evaluators assessing and improving chatbot responses, a task that does not directly involve feeding web data into the AI model, but is nonetheless essential for its development. The legal complexities surrounding copyright in AI training processes continue to evolve, with significant implications for the future of AI development.

Sources : Business Insider

Published On : Jul 23, 2025, 09:15

Automotive

Group14 Launches Major Factory to Revolutionize EV Battery Technology

The excitement surrounding silicon anode batteries is reaching new heights among electric vehicle (EV) enthusiasts and h...

TechCrunch | Mar 12, 2026, 12:25

Group14 Launches Major Factory to Revolutionize EV Battery Technology

Automotive

Tokyo Set to Welcome Robotaxis: Uber, Wayve, and Nissan Join Forces

In an exciting development for urban transportation, Wayve, a U.K.-based company specializing in autonomous vehicle soft...

TechCrunch | Mar 12, 2026, 15:05

Tokyo Set to Welcome Robotaxis: Uber, Wayve, and Nissan Join Forces

Gaming

Nintendo Stock Soars 18% Amidst Surprising Success of New Pokémon Game

Nintendo's shares have surged by 18% this week, driven by the unexpected success of a new Pokémon game that has sparked ...

CNBC | Mar 12, 2026, 12:05

Nintendo Stock Soars 18% Amidst Surprising Success of New Pokémon Game

Cybersecurity

UK Demands Stricter Child Safety Measures from Social Media Platforms

UK regulators are intensifying their calls for social media companies to bolster protections for children following lawm...

CNBC | Mar 12, 2026, 15:30

UK Demands Stricter Child Safety Measures from Social Media Platforms

Automotive

Rivian Unveils Exciting Details for Upcoming R2 SUV Lineup

Rivian is making waves in the automotive world with the announcement of its highly anticipated R2 SUV, set to hit the ma...

Ars Technica | Mar 12, 2026, 15:05

Rivian Unveils Exciting Details for Upcoming R2 SUV Lineup

View All News

High-quality, Cost-effective IT Outsourcing

let’s grow together!

portfolio

case study

follow us on

follow us on

Here's the list of websites gig workers used to fine-tune Anthropic's AI models. Its contractor left it wide open.

Group14 Launches Major Factory to Revolutionize EV Battery Technology

Tokyo Set to Welcome Robotaxis: Uber, Wayve, and Nissan Join Forces

Nintendo Stock Soars 18% Amidst Surprising Success of New Pokémon Game

UK Demands Stricter Child Safety Measures from Social Media Platforms

Rivian Unveils Exciting Details for Upcoming R2 SUV Lineup

Collaborate with Benzatine Infotech

High-quality, Cost-effective IT Outsourcing

let’s grow together!

portfolios

case study

follow us on

follow us on

portfolio

case study

follow us on

follow us on

Here's the list of websites gig workers used to fine-tune Anthropic's AI models. Its contractor left it wide open.

Group14 Launches Major Factory to Revolutionize EV Battery Technology

Tokyo Set to Welcome Robotaxis: Uber, Wayve, and Nissan Join Forces

Nintendo Stock Soars 18% Amidst Surprising Success of New Pokémon Game

UK Demands Stricter Child Safety Measures from Social Media Platforms

Rivian Unveils Exciting Details for Upcoming R2 SUV Lineup

Collaborate with Benzatine Infotech