DeepSeek may have used Google’s Gemini to train its latest model

DeepSeek may have used Google’s Gemini to train its latest model

Recently, the Chinese research lab DeepSeek unveiled an upgraded version of its R1 reasoning AI model, which has shown impressive results in various mathematical and coding assessments. However, the company has not disclosed the sources of the data utilized for training this model, leading some AI experts to suspect that a segment of it may have originated from Google's Gemini AI suite. Melbourne developer Sam Paeach, who specializes in creating emotional intelligence assessments for AI, claims to have found indications that DeepSeek's R1-0528 model was influenced by outputs from Gemini. In a post on X, he noted that the language and expressions favored by DeepSeek's model bear resemblance to those of Gemini 2.5 Pro. Although this is not definitive proof, another developer, using the alias behind a project called SpeechMap, pointed out that the thought patterns generated by DeepSeek's model resemble those associated with Gemini outputs. DeepSeek has faced accusations of leveraging rival AI model data in the past. In December, it was observed that the V3 model often identified itself as ChatGPT, suggesting it may have been trained on logs from OpenAI’s chatbot. Earlier this year, OpenAI informed the Financial Times about findings that linked DeepSeek to the practice of distillation, a method for training AI models by extracting knowledge from larger, more powerful models. Reports from Bloomberg indicated that in late 2024, Microsoft, a key partner and investor of OpenAI, discovered significant data exfiltration via OpenAI developer accounts believed to be associated with DeepSeek. While distillation is a recognized practice, OpenAI's terms prohibit the use of its model outputs to develop competing AI systems. It's important to note that many models can misidentify themselves and converge on similar language due to the overwhelming influence of AI-generated content online. With AI content farms producing an abundance of clickbait and bots inundating platforms like Reddit and X, distinguishing AI outputs from training datasets has become increasingly challenging. Nathan Lambert, a researcher at the nonprofit AI research organization AI2, speculated that if he were in DeepSeek's position, he would certainly generate synthetic data from the top API model available, given their limitations in GPU resources but availability of funds. In response to concerns about distillation, AI companies are enhancing their security protocols. OpenAI, for example, initiated a mandatory ID verification process in April for organizations seeking access to advanced models, which excludes users from China. Additionally, Google has begun summarizing the traces produced by models on its AI Studio platform to complicate the creation of competitive models based on Gemini data. In May, Anthropic announced similar measures to protect its competitive edge. We have reached out to Google for a response and will provide updates as they become available.

Sources : TechCrunch

Published On : Jun 03, 2025, 16:35

AI
Corporate Giants Shift Focus: Evaluating Employee AI Proficiency

In the rapidly evolving landscape of artificial intelligence, corporate leaders are emphasizing their AI adoption rates ...

Business Insider | Mar 09, 2026, 09:05
Corporate Giants Shift Focus: Evaluating Employee AI Proficiency
AI
Pentagon's Anthropic Dispute: A Wake-Up Call for Startups in Defense Tech?

In a dramatic turn of events, negotiations surrounding the Pentagon's use of Anthropic's Claude AI technology recently c...

TechCrunch | Mar 08, 2026, 20:30
Pentagon's Anthropic Dispute: A Wake-Up Call for Startups in Defense Tech?
Startups
The Rise of AI: Transforming Corporate Management Structures

In recent years, corporate leaders have increasingly advocated for a 'Great Flattening' within their organizations. This...

Business Insider | Mar 09, 2026, 09:05
The Rise of AI: Transforming Corporate Management Structures
Cybersecurity
Ring's Jamie Siminoff Addresses Privacy Concerns Amid Controversy and Surveillance Debate

Jamie Siminoff, the founder and CEO of Ring, faced a wave of scrutiny following the company's debut Super Bowl advertise...

TechCrunch | Mar 09, 2026, 05:10
Ring's Jamie Siminoff Addresses Privacy Concerns Amid Controversy and Surveillance Debate
Computing
Apple Set to Unveil Premium MacBook with Touchscreen Technology

In an exciting development for tech enthusiasts, Apple is reportedly gearing up to introduce a new high-end MacBook mode...

Business Today | Mar 09, 2026, 07:05
Apple Set to Unveil Premium MacBook with Touchscreen Technology
View All News