DeepSeek may have used Google’s Gemini to train its latest model

Recently, the Chinese research lab DeepSeek unveiled an upgraded version of its R1 reasoning AI model, which has shown impressive results in various mathematical and coding assessments. However, the company has not disclosed the sources of the data utilized for training this model, leading some AI experts to suspect that a segment of it may have originated from Google's Gemini AI suite. Melbourne developer Sam Paeach, who specializes in creating emotional intelligence assessments for AI, claims to have found indications that DeepSeek's R1-0528 model was influenced by outputs from Gemini. In a post on X, he noted that the language and expressions favored by DeepSeek's model bear resemblance to those of Gemini 2.5 Pro. Although this is not definitive proof, another developer, using the alias behind a project called SpeechMap, pointed out that the thought patterns generated by DeepSeek's model resemble those associated with Gemini outputs. DeepSeek has faced accusations of leveraging rival AI model data in the past. In December, it was observed that the V3 model often identified itself as ChatGPT, suggesting it may have been trained on logs from OpenAI’s chatbot. Earlier this year, OpenAI informed the Financial Times about findings that linked DeepSeek to the practice of distillation, a method for training AI models by extracting knowledge from larger, more powerful models. Reports from Bloomberg indicated that in late 2024, Microsoft, a key partner and investor of OpenAI, discovered significant data exfiltration via OpenAI developer accounts believed to be associated with DeepSeek. While distillation is a recognized practice, OpenAI's terms prohibit the use of its model outputs to develop competing AI systems. It's important to note that many models can misidentify themselves and converge on similar language due to the overwhelming influence of AI-generated content online. With AI content farms producing an abundance of clickbait and bots inundating platforms like Reddit and X, distinguishing AI outputs from training datasets has become increasingly challenging. Nathan Lambert, a researcher at the nonprofit AI research organization AI2, speculated that if he were in DeepSeek's position, he would certainly generate synthetic data from the top API model available, given their limitations in GPU resources but availability of funds. In response to concerns about distillation, AI companies are enhancing their security protocols. OpenAI, for example, initiated a mandatory ID verification process in April for organizations seeking access to advanced models, which excludes users from China. Additionally, Google has begun summarizing the traces produced by models on its AI Studio platform to complicate the creation of competitive models based on Gemini data. In May, Anthropic announced similar measures to protect its competitive edge. We have reached out to Google for a response and will provide updates as they become available.

Sources : TechCrunch

Published On : Jun 03, 2025, 16:35

Computing

Apple's Ambitious Cloud Vision: A New Contender in the Tech Battlefield

Apple is reportedly exploring the launch of a cloud service tailored for developers, leveraging its proprietary silicon ...

Business Today | Jul 04, 2025, 12:10

Apple's Ambitious Cloud Vision: A New Contender in the Tech Battlefield

Computing

Microsoft Bids Farewell to Pakistan After 25 Years Amid Economic Challenges

In a significant move, Microsoft has officially ceased its operations in Pakistan, bringing to a close a 25-year presenc...

Business Today | Jul 04, 2025, 12:35

Microsoft Bids Farewell to Pakistan After 25 Years Amid Economic Challenges

EU Stands Firm on AI Legislation Timeline Despite Industry Pushback

On Friday, the European Union reaffirmed its commitment to the scheduled implementation of its groundbreaking AI legisla...

TechCrunch | Jul 04, 2025, 12:40

EU Stands Firm on AI Legislation Timeline Despite Industry Pushback

Transforming Content Marketing: Navigating the Landscape of AI and Its Impacts

Artificial intelligence tools such as ChatGPT and DALL-E2 are revolutionizing the marketing landscape by providing innov...

Business Insider | Jul 04, 2025, 11:01

Transforming Content Marketing: Navigating the Landscape of AI and Its Impacts

Gadgets

Unveiling Prime Day Sale 2025: Massive Discounts on Must-Have Gadgets!

The countdown to the Amazon Prime Day Sale 2025 has begun, and early-bird deals are already enticing tech enthusiasts wi...

Mint | Jul 04, 2025, 13:35

Unveiling Prime Day Sale 2025: Massive Discounts on Must-Have Gadgets!

View All News

High-quality, Cost-effective IT Outsourcing

let’s grow together!

portfolio

case study

follow us on

follow us on

DeepSeek may have used Google’s Gemini to train its latest model

Apple's Ambitious Cloud Vision: A New Contender in the Tech Battlefield

Microsoft Bids Farewell to Pakistan After 25 Years Amid Economic Challenges

EU Stands Firm on AI Legislation Timeline Despite Industry Pushback

Transforming Content Marketing: Navigating the Landscape of AI and Its Impacts

Unveiling Prime Day Sale 2025: Massive Discounts on Must-Have Gadgets!

Collaborate with Benzatine Infotech

High-quality, Cost-effective IT Outsourcing

let’s grow together!

portfolios

case study

follow us on

follow us on

portfolio

case study

follow us on

follow us on

DeepSeek may have used Google’s Gemini to train its latest model

Apple's Ambitious Cloud Vision: A New Contender in the Tech Battlefield

Microsoft Bids Farewell to Pakistan After 25 Years Amid Economic Challenges

EU Stands Firm on AI Legislation Timeline Despite Industry Pushback

Transforming Content Marketing: Navigating the Landscape of AI and Its Impacts

Unveiling Prime Day Sale 2025: Massive Discounts on Must-Have Gadgets!

Collaborate with Benzatine Infotech