New vision model from Cohere runs on two GPUs, beats top-tier VLMs on visual tasks

Cohere, a Canadian AI company, has introduced its latest innovation, Command A Vision, a visual model designed specifically for enterprise applications. This new model builds on the existing Command A framework, featuring an impressive 112 billion parameters. It aims to transform how businesses interact with visual data, enabling them to extract critical insights and make informed decisions through advanced document optical character recognition (OCR) and image analysis. According to Cohere, Command A Vision excels in interpreting complex visual information, such as product manuals with intricate diagrams and photographs for risk assessment. The model can analyze a variety of common image formats used in enterprise settings, including graphs, charts, diagrams, scanned documents, and PDFs. This capability positions it as a powerful tool for addressing the specific vision-related challenges that organizations face. One of the standout features of Command A Vision is its efficiency; it operates effectively on two or fewer GPUs, similar to its text-focused predecessor, Command A. This model not only retains the text analysis capabilities of Command A but also understands over 23 languages, making it a versatile asset for multinational businesses. Cohere emphasized that Command A Vision is designed to minimize total ownership costs for enterprises and is fully tailored for retrieval use cases. The model was developed using a Llava architecture, which converts visual features into manageable soft vision tokens. These tokens are then processed within the Command A text tower, which boasts a deep learning architecture with 111 billion parameters. The training process for Command A Vision involved three key stages: aligning vision and language, supervised fine-tuning (SFT), and post-training reinforcement learning with human feedback (RLHF). This methodology allows for a seamless integration of visual encoding into the language model's framework, enhancing the model's overall performance. In benchmark testing against notable competitors such as OpenAI’s GPT-4.1, Meta’s Llama 4 Maverick, and Mistral’s Pixtral models, Command A Vision demonstrated superior performance across various tasks, including Chart QA and OCRBench. It achieved an average score of 83.1%, surpassing the scores of its rivals, which ranged from 78.3% to 80.5%. As the demand for models capable of managing unstructured data increases with the rise of Deep Research technologies, Cohere's open weights approach for Command A Vision aims to attract enterprises seeking alternatives to proprietary models. Initial feedback from developers has been promising, with many expressing enthusiasm for the model's accuracy, particularly in extracting information from handwritten notes and other complex visual inputs.

Sources : VentureBeat

Published On : Aug 03, 2025, 22:20

Cybersecurity

Massive Data Breach Exposes Over a Million Passports and IDs Due to Security Oversight

A significant security breach has revealed that a hotel check-in system inadvertently exposed more than one million cust...

TechCrunch | May 15, 2026, 19:05

Massive Data Breach Exposes Over a Million Passports and IDs Due to Security Oversight

Startups

Cramer Raises Alarm Over Potential Market Impact of SpaceX IPO

Jim Cramer, the host of CNBC's 'Mad Money,' expressed serious concerns regarding speculative behavior in the IPO market ...

CNBC | May 15, 2026, 23:15

Cramer Raises Alarm Over Potential Market Impact of SpaceX IPO

Science

Ebola Crisis Escalates: Outbreak Spreads from Congo to Uganda

An alarming Ebola outbreak has been confirmed in the Ituri province of the Democratic Republic of the Congo, as reported...

Ars Technica | May 15, 2026, 19:01

Ebola Crisis Escalates: Outbreak Spreads from Congo to Uganda

Startups

Trump's Strategic Investment in Palantir Revealed Amid Stock Praise

Recent disclosures from the U.S. Office of Government Ethics have unveiled that former President Donald Trump bought sha...

CNBC | May 15, 2026, 19:35

Trump's Strategic Investment in Palantir Revealed Amid Stock Praise

Science

Hantavirus Scare on Cruise Ship: False Positive Reduces Case Count

In a recent update, health officials from the World Health Organization (WHO) have announced a reduction in the reported...

Ars Technica | May 15, 2026, 21:35

Hantavirus Scare on Cruise Ship: False Positive Reduces Case Count

View All News

High-quality, Cost-effective IT Outsourcing

let’s grow together!

portfolio

case study

follow us on

follow us on

New vision model from Cohere runs on two GPUs, beats top-tier VLMs on visual tasks

Massive Data Breach Exposes Over a Million Passports and IDs Due to Security Oversight

Cramer Raises Alarm Over Potential Market Impact of SpaceX IPO

Ebola Crisis Escalates: Outbreak Spreads from Congo to Uganda

Trump's Strategic Investment in Palantir Revealed Amid Stock Praise

Hantavirus Scare on Cruise Ship: False Positive Reduces Case Count

Collaborate with Benzatine Infotech

High-quality, Cost-effective IT Outsourcing

let’s grow together!

portfolios

case study

follow us on

follow us on

portfolio

case study

follow us on

follow us on

New vision model from Cohere runs on two GPUs, beats top-tier VLMs on visual tasks

Massive Data Breach Exposes Over a Million Passports and IDs Due to Security Oversight

Cramer Raises Alarm Over Potential Market Impact of SpaceX IPO

Ebola Crisis Escalates: Outbreak Spreads from Congo to Uganda

Trump's Strategic Investment in Palantir Revealed Amid Stock Praise

Hantavirus Scare on Cruise Ship: False Positive Reduces Case Count

Collaborate with Benzatine Infotech