Sarvam launches Sarvam Audio, claims to offer better accuracy than GPT-4o, Gemini 3 Flash

Sarvam launches Sarvam Audio, claims to offer better accuracy than GPT-4o, Gemini 3 Flash

Sarvam AI, an innovative Indian startup, has introduced its latest creation, Sarvam Audio, a large language model (LLM) designed specifically for audio processing. This model aims to cater to the unique linguistic patterns prevalent in India, boasting advanced capabilities in voice recognition and transcription across the nation’s diverse languages. What sets Sarvam Audio apart is its voice-centric AI architecture, which is fine-tuned to comprehend real-world speech within India's multilingual landscape. Unlike competitors such as ElevenLabs, which are primarily focused on generating expressive voice outputs, Sarvam Audio is dedicated to accurately interpreting and transcribing everyday conversations, particularly in Indian languages. Given India's rich tapestry of languages, accents, and dialects, traditional automatic speech recognition (ASR) systems often struggle to maintain reliability and accuracy. Sarvam Audio addresses this gap by being adept at understanding and processing complex speech patterns, thus enhancing conversational flow. The model is trained on a robust spectrum of 22 Indian languages, including Hindi, Tamil, Telugu, Malayalam, Marathi, Bengali, and Indian English. Built upon the Sarvam 3B model, which incorporates three billion parameters, Sarvam Audio supports various transcription formats. Early benchmark tests suggest that it may outperform leading models like GPT-4o-Transcribe and Gemini-3-Flash in accuracy, particularly in three different transcription styles: unnormalised, normalised, and code-mixed. These tests utilized the IndicVoices dataset, showcasing its effectiveness in managing authentic Indian speech. Sarvam AI emphasizes that while global models by OpenAI and Google target standard transcription tasks, Sarvam Audio is specifically engineered for Indian languages. Among its notable features is the 'Diarised Speech Recognition' capability, which excels in handling complex scenarios involving multiple speakers, ensuring a higher level of accuracy in natural conversations. In a recent statement, Sarvam underscored the model's potential applications, stating, “With built-in context awareness, diarization, format control, and direct speech-to-command capabilities, Sarvam Audio lays the groundwork for a new wave of voice-first applications designed for real Indian users.” The versatility of Sarvam Audio opens doors for a myriad of real-world applications. From facilitating multilingual transcription to enhancing multi-speaker discussions in sectors like call centers, logistics, e-commerce, banking, and Fintech, this model is set to transform how audio data is processed in Indian languages. Additionally, it holds promise for long-form audio applications, including podcasts, meetings, and lectures, enhancing accessibility and engagement across diverse platforms.

Sources : Business Today

Published On : Feb 03, 2026, 08:05

AI
Sam Altman Faces Lawmakers Over OpenAI's Military Collaboration

Sam Altman, the CEO of OpenAI, recently engaged in a crucial dialogue with several lawmakers in Washington, D.C., where ...

CNBC | Mar 12, 2026, 20:25
Sam Altman Faces Lawmakers Over OpenAI's Military Collaboration
Startups
Sunday Secures $165 Million to Propel Humanoid Robotics into Homes

Robotics innovator Sunday has achieved a remarkable milestone, raising $165 million in a recent funding round that eleva...

TechCrunch | Mar 12, 2026, 17:45
Sunday Secures $165 Million to Propel Humanoid Robotics into Homes
Startups
Webflow Expands Marketing Capabilities with Vidoso Acquisition

Webflow, a prominent player in the website building and hosting domain, is set to enhance its marketing suite with the a...

TechCrunch | Mar 12, 2026, 17:30
Webflow Expands Marketing Capabilities with Vidoso Acquisition
Streaming
Substack Unveils Innovative Recording Studio for Creators

Substack is making significant strides in the realm of video content with the introduction of its new Substack Recording...

TechCrunch | Mar 12, 2026, 18:45
Substack Unveils Innovative Recording Studio for Creators
Cybersecurity
Sam Bankman-Fried's Political Pivot Fails to Impress Trump’s Justice Department

Since Donald Trump’s presidency began, the founder of FTX, Sam Bankman-Fried, has been on a mission to rebrand himself a...

Ars Technica | Mar 12, 2026, 19:00
Sam Bankman-Fried's Political Pivot Fails to Impress Trump’s Justice Department
View All News