DeepSeek releases ‘sparse attention’ model that cuts API costs in half

On Monday, DeepSeek introduced an innovative experimental model, V3.2-exp, aimed at significantly reducing inference costs during long-context operations. The announcement was made via a post on Hugging Face, where the company also shared an accompanying academic paper on GitHub. At the heart of this new model lies DeepSeek Sparse Attention, a complex system that employs a component known as a 'lightning indexer.' This indexer is designed to prioritize particular excerpts from the context window, streamlining the processing of information. Following this, a 'fine-grained token selection system' identifies specific tokens within these excerpts to fit into the model's limited attention window. Together, these innovations enable Sparse Attention models to efficiently handle extensive context with much lower server demands. The advantages of this system for long-context tasks are remarkable. Initial testing by DeepSeek indicated that the cost of a basic API call could potentially be cut by up to 50% in scenarios involving lengthy contexts. Although further testing is necessary to substantiate these findings, the model's open-weight status and availability on Hugging Face mean that third-party evaluations are likely to emerge soon. This latest model joins a series of recent advancements aimed at addressing the challenges of inference costs—the operational expenses associated with running a pre-trained AI model, distinct from training costs. DeepSeek's researchers have concentrated on enhancing the efficiency of the foundational transformer architecture, revealing that there are considerable gains to be made. Based in China, DeepSeek has carved out a unique position within the AI landscape, especially amidst perceptions of a competitive national rivalry in AI research between the U.S. and China. Earlier this year, the company garnered attention with its R1 model, which was trained using primarily reinforcement learning techniques at a fraction of the cost of its American counterparts. However, the model did not ignite the anticipated transformation in AI training, and the company has since stepped back from the limelight. While the new 'sparse attention' method may not create the same level of excitement as R1, it has the potential to impart valuable strategies to U.S. providers for reducing inference expenses.

Sources : TechCrunch

Published On : Sep 29, 2025, 20:50

Startups

Musk Settles SEC Lawsuit for $1.5 Million Amid Controversy

In a surprising turn of events, Elon Musk has been allowed by the Trump administration to settle a significant lawsuit w...

Ars Technica | May 05, 2026, 17:10

Musk Settles SEC Lawsuit for $1.5 Million Amid Controversy

Gaming

Revamping Xbox: New Leadership Aims to Turn Sales Decline Around

In a bold move to reinvigorate Microsoft's gaming division, Xbox CEO Asha Sharma has announced a significant leadership ...

CNBC | May 05, 2026, 16:15

Revamping Xbox: New Leadership Aims to Turn Sales Decline Around

Google Unveils Ambitious AI Personal Agent: Meet 'Remy'

Google is in the process of developing an innovative AI personal agent called 'Remy' for its Gemini platform. This new d...

Business Insider | May 05, 2026, 18:05

Google Unveils Ambitious AI Personal Agent: Meet 'Remy'

Cybersecurity

Meta Enhances AI Tools to Better Protect Young Users on Social Media

Meta has announced a new initiative aimed at using artificial intelligence to better identify users under the age of 13 ...

TechCrunch | May 05, 2026, 14:50

Meta Enhances AI Tools to Better Protect Young Users on Social Media

Automotive

Tesla's FSD Faces European Hurdles Despite Dutch Endorsement

In the wake of a significant shareholder vote last year, Tesla CEO Elon Musk's immense fortune has become closely tied t...

Ars Technica | May 05, 2026, 15:10

Tesla's FSD Faces European Hurdles Despite Dutch Endorsement

View All News

High-quality, Cost-effective IT Outsourcing

let’s grow together!

portfolio

case study

follow us on

follow us on

DeepSeek releases ‘sparse attention’ model that cuts API costs in half

Musk Settles SEC Lawsuit for $1.5 Million Amid Controversy

Revamping Xbox: New Leadership Aims to Turn Sales Decline Around

Google Unveils Ambitious AI Personal Agent: Meet 'Remy'

Meta Enhances AI Tools to Better Protect Young Users on Social Media

Tesla's FSD Faces European Hurdles Despite Dutch Endorsement

Collaborate with Benzatine Infotech

High-quality, Cost-effective IT Outsourcing

let’s grow together!

portfolios

case study

follow us on

follow us on

portfolio

case study

follow us on

follow us on

DeepSeek releases ‘sparse attention’ model that cuts API costs in half

Musk Settles SEC Lawsuit for $1.5 Million Amid Controversy

Revamping Xbox: New Leadership Aims to Turn Sales Decline Around

Google Unveils Ambitious AI Personal Agent: Meet 'Remy'

Meta Enhances AI Tools to Better Protect Young Users on Social Media

Tesla's FSD Faces European Hurdles Despite Dutch Endorsement

Collaborate with Benzatine Infotech