New vision model from Cohere runs on two GPUs, beats top-tier VLMs on visual tasks

New vision model from Cohere runs on two GPUs, beats top-tier VLMs on visual tasks

Cohere, a Canadian AI company, has introduced its latest innovation, Command A Vision, a visual model designed specifically for enterprise applications. This new model builds on the existing Command A framework, featuring an impressive 112 billion parameters. It aims to transform how businesses interact with visual data, enabling them to extract critical insights and make informed decisions through advanced document optical character recognition (OCR) and image analysis. According to Cohere, Command A Vision excels in interpreting complex visual information, such as product manuals with intricate diagrams and photographs for risk assessment. The model can analyze a variety of common image formats used in enterprise settings, including graphs, charts, diagrams, scanned documents, and PDFs. This capability positions it as a powerful tool for addressing the specific vision-related challenges that organizations face. One of the standout features of Command A Vision is its efficiency; it operates effectively on two or fewer GPUs, similar to its text-focused predecessor, Command A. This model not only retains the text analysis capabilities of Command A but also understands over 23 languages, making it a versatile asset for multinational businesses. Cohere emphasized that Command A Vision is designed to minimize total ownership costs for enterprises and is fully tailored for retrieval use cases. The model was developed using a Llava architecture, which converts visual features into manageable soft vision tokens. These tokens are then processed within the Command A text tower, which boasts a deep learning architecture with 111 billion parameters. The training process for Command A Vision involved three key stages: aligning vision and language, supervised fine-tuning (SFT), and post-training reinforcement learning with human feedback (RLHF). This methodology allows for a seamless integration of visual encoding into the language model's framework, enhancing the model's overall performance. In benchmark testing against notable competitors such as OpenAI’s GPT-4.1, Meta’s Llama 4 Maverick, and Mistral’s Pixtral models, Command A Vision demonstrated superior performance across various tasks, including Chart QA and OCRBench. It achieved an average score of 83.1%, surpassing the scores of its rivals, which ranged from 78.3% to 80.5%. As the demand for models capable of managing unstructured data increases with the rise of Deep Research technologies, Cohere's open weights approach for Command A Vision aims to attract enterprises seeking alternatives to proprietary models. Initial feedback from developers has been promising, with many expressing enthusiasm for the model's accuracy, particularly in extracting information from handwritten notes and other complex visual inputs.

Sources : VentureBeat

Published On : Aug 03, 2025, 22:20

Startups
Atlassian CEO Highlights Graduate Talent Amid Job Cuts, Offering Hope for New Entrants

In a recent communication, Atlassian's CEO Mike Cannon-Brookes provided unexpected reassurance to recent graduates conce...

Business Insider | Mar 12, 2026, 17:01
Atlassian CEO Highlights Graduate Talent Amid Job Cuts, Offering Hope for New Entrants
Startups
Sunday Secures $165 Million to Propel Humanoid Robotics into Homes

Robotics innovator Sunday has achieved a remarkable milestone, raising $165 million in a recent funding round that eleva...

TechCrunch | Mar 12, 2026, 17:45
Sunday Secures $165 Million to Propel Humanoid Robotics into Homes
AI
Sam Altman Faces Lawmakers Over OpenAI's Military Collaboration

Sam Altman, the CEO of OpenAI, recently engaged in a crucial dialogue with several lawmakers in Washington, D.C., where ...

CNBC | Mar 12, 2026, 20:25
Sam Altman Faces Lawmakers Over OpenAI's Military Collaboration
Startups
Revelations Unveil Live Nation's Ticketing Tactics Amid Legal Scrutiny

Recently released documents have revealed startling admissions from a regional director at Live Nation, who allegedly br...

Ars Technica | Mar 12, 2026, 20:50
Revelations Unveil Live Nation's Ticketing Tactics Amid Legal Scrutiny
Computing
Software Industry Faces a Financial Reckoning Amid AI Disruption

A recent conversation with a CEO from a leading software firm revealed alarming predictions for the industry. He warned ...

Business Insider | Mar 12, 2026, 18:20
Software Industry Faces a Financial Reckoning Amid AI Disruption
View All News