
On Monday, DeepSeek introduced an innovative experimental model, V3.2-exp, aimed at significantly reducing inference costs during long-context operations. The announcement was made via a post on Hugging Face, where the company also shared an accompanying academic paper on GitHub. At the heart of this new model lies DeepSeek Sparse Attention, a complex system that employs a component known as a 'lightning indexer.' This indexer is designed to prioritize particular excerpts from the context window, streamlining the processing of information. Following this, a 'fine-grained token selection system' identifies specific tokens within these excerpts to fit into the model's limited attention window. Together, these innovations enable Sparse Attention models to efficiently handle extensive context with much lower server demands. The advantages of this system for long-context tasks are remarkable. Initial testing by DeepSeek indicated that the cost of a basic API call could potentially be cut by up to 50% in scenarios involving lengthy contexts. Although further testing is necessary to substantiate these findings, the model's open-weight status and availability on Hugging Face mean that third-party evaluations are likely to emerge soon. This latest model joins a series of recent advancements aimed at addressing the challenges of inference costs—the operational expenses associated with running a pre-trained AI model, distinct from training costs. DeepSeek's researchers have concentrated on enhancing the efficiency of the foundational transformer architecture, revealing that there are considerable gains to be made. Based in China, DeepSeek has carved out a unique position within the AI landscape, especially amidst perceptions of a competitive national rivalry in AI research between the U.S. and China. Earlier this year, the company garnered attention with its R1 model, which was trained using primarily reinforcement learning techniques at a fraction of the cost of its American counterparts. However, the model did not ignite the anticipated transformation in AI training, and the company has since stepped back from the limelight. While the new 'sparse attention' method may not create the same level of excitement as R1, it has the potential to impart valuable strategies to U.S. providers for reducing inference expenses.
Fi, the Indian neobank that gained traction over the past four years, is officially winding down its banking operations....
TechCrunch | Mar 11, 2026, 22:30
This week, Ford introduced a groundbreaking AI assistant designed to help fleet owners track vital metrics like seatbelt...
TechCrunch | Mar 11, 2026, 23:00
Nuro, a startup from Silicon Valley backed by prominent investors including Nvidia, Uber, and Softbank, is stepping into...
TechCrunch | Mar 11, 2026, 23:35
Last summer, when Asus and Microsoft unveiled the ROG Xbox Ally X, it featured a unique, controller-friendly interface t...
Ars Technica | Mar 11, 2026, 21:00
The recent surge in artificial intelligence spending is transforming the memory industry in unprecedented ways. Over the...
CNBC | Mar 11, 2026, 21:15