
Sarvam AI, an innovative Indian startup, has introduced its latest creation, Sarvam Audio, a large language model (LLM) designed specifically for audio processing. This model aims to cater to the unique linguistic patterns prevalent in India, boasting advanced capabilities in voice recognition and transcription across the nation’s diverse languages. What sets Sarvam Audio apart is its voice-centric AI architecture, which is fine-tuned to comprehend real-world speech within India's multilingual landscape. Unlike competitors such as ElevenLabs, which are primarily focused on generating expressive voice outputs, Sarvam Audio is dedicated to accurately interpreting and transcribing everyday conversations, particularly in Indian languages. Given India's rich tapestry of languages, accents, and dialects, traditional automatic speech recognition (ASR) systems often struggle to maintain reliability and accuracy. Sarvam Audio addresses this gap by being adept at understanding and processing complex speech patterns, thus enhancing conversational flow. The model is trained on a robust spectrum of 22 Indian languages, including Hindi, Tamil, Telugu, Malayalam, Marathi, Bengali, and Indian English. Built upon the Sarvam 3B model, which incorporates three billion parameters, Sarvam Audio supports various transcription formats. Early benchmark tests suggest that it may outperform leading models like GPT-4o-Transcribe and Gemini-3-Flash in accuracy, particularly in three different transcription styles: unnormalised, normalised, and code-mixed. These tests utilized the IndicVoices dataset, showcasing its effectiveness in managing authentic Indian speech. Sarvam AI emphasizes that while global models by OpenAI and Google target standard transcription tasks, Sarvam Audio is specifically engineered for Indian languages. Among its notable features is the 'Diarised Speech Recognition' capability, which excels in handling complex scenarios involving multiple speakers, ensuring a higher level of accuracy in natural conversations. In a recent statement, Sarvam underscored the model's potential applications, stating, “With built-in context awareness, diarization, format control, and direct speech-to-command capabilities, Sarvam Audio lays the groundwork for a new wave of voice-first applications designed for real Indian users.” The versatility of Sarvam Audio opens doors for a myriad of real-world applications. From facilitating multilingual transcription to enhancing multi-speaker discussions in sectors like call centers, logistics, e-commerce, banking, and Fintech, this model is set to transform how audio data is processed in Indian languages. Additionally, it holds promise for long-form audio applications, including podcasts, meetings, and lectures, enhancing accessibility and engagement across diverse platforms.
Sam Altman, the CEO of OpenAI, recently engaged in a crucial dialogue with several lawmakers in Washington, D.C., where ...
CNBC | Mar 12, 2026, 20:25
Robotics innovator Sunday has achieved a remarkable milestone, raising $165 million in a recent funding round that eleva...
TechCrunch | Mar 12, 2026, 17:45
Webflow, a prominent player in the website building and hosting domain, is set to enhance its marketing suite with the a...
TechCrunch | Mar 12, 2026, 17:30
Substack is making significant strides in the realm of video content with the introduction of its new Substack Recording...
TechCrunch | Mar 12, 2026, 18:45
Since Donald Trump’s presidency began, the founder of FTX, Sam Bankman-Fried, has been on a mission to rebrand himself a...
Ars Technica | Mar 12, 2026, 19:00