Recently, the Chinese research lab DeepSeek unveiled an upgraded version of its R1 reasoning AI model, which has shown impressive results in various mathematical and coding assessments. However, the company has not disclosed the sources of the data utilized for training this model, leading some AI experts to suspect that a segment of it may have originated from Google's Gemini AI suite. Melbourne developer Sam Paeach, who specializes in creating emotional intelligence assessments for AI, claims to have found indications that DeepSeek's R1-0528 model was influenced by outputs from Gemini. In a post on X, he noted that the language and expressions favored by DeepSeek's model bear resemblance to those of Gemini 2.5 Pro. Although this is not definitive proof, another developer, using the alias behind a project called SpeechMap, pointed out that the thought patterns generated by DeepSeek's model resemble those associated with Gemini outputs. DeepSeek has faced accusations of leveraging rival AI model data in the past. In December, it was observed that the V3 model often identified itself as ChatGPT, suggesting it may have been trained on logs from OpenAI’s chatbot. Earlier this year, OpenAI informed the Financial Times about findings that linked DeepSeek to the practice of distillation, a method for training AI models by extracting knowledge from larger, more powerful models. Reports from Bloomberg indicated that in late 2024, Microsoft, a key partner and investor of OpenAI, discovered significant data exfiltration via OpenAI developer accounts believed to be associated with DeepSeek. While distillation is a recognized practice, OpenAI's terms prohibit the use of its model outputs to develop competing AI systems. It's important to note that many models can misidentify themselves and converge on similar language due to the overwhelming influence of AI-generated content online. With AI content farms producing an abundance of clickbait and bots inundating platforms like Reddit and X, distinguishing AI outputs from training datasets has become increasingly challenging. Nathan Lambert, a researcher at the nonprofit AI research organization AI2, speculated that if he were in DeepSeek's position, he would certainly generate synthetic data from the top API model available, given their limitations in GPU resources but availability of funds. In response to concerns about distillation, AI companies are enhancing their security protocols. OpenAI, for example, initiated a mandatory ID verification process in April for organizations seeking access to advanced models, which excludes users from China. Additionally, Google has begun summarizing the traces produced by models on its AI Studio platform to complicate the creation of competitive models based on Gemini data. In May, Anthropic announced similar measures to protect its competitive edge. We have reached out to Google for a response and will provide updates as they become available.
Apple is reportedly exploring the launch of a cloud service tailored for developers, leveraging its proprietary silicon ...
Business Today | Jul 04, 2025, 12:10In a significant move, Microsoft has officially ceased its operations in Pakistan, bringing to a close a 25-year presenc...
Business Today | Jul 04, 2025, 12:35On Friday, the European Union reaffirmed its commitment to the scheduled implementation of its groundbreaking AI legisla...
TechCrunch | Jul 04, 2025, 12:40Artificial intelligence tools such as ChatGPT and DALL-E2 are revolutionizing the marketing landscape by providing innov...
Business Insider | Jul 04, 2025, 11:01The countdown to the Amazon Prime Day Sale 2025 has begun, and early-bird deals are already enticing tech enthusiasts wi...
Mint | Jul 04, 2025, 13:35