
Cohere, a Canadian AI company, has introduced its latest innovation, Command A Vision, a visual model designed specifically for enterprise applications. This new model builds on the existing Command A framework, featuring an impressive 112 billion parameters. It aims to transform how businesses interact with visual data, enabling them to extract critical insights and make informed decisions through advanced document optical character recognition (OCR) and image analysis. According to Cohere, Command A Vision excels in interpreting complex visual information, such as product manuals with intricate diagrams and photographs for risk assessment. The model can analyze a variety of common image formats used in enterprise settings, including graphs, charts, diagrams, scanned documents, and PDFs. This capability positions it as a powerful tool for addressing the specific vision-related challenges that organizations face. One of the standout features of Command A Vision is its efficiency; it operates effectively on two or fewer GPUs, similar to its text-focused predecessor, Command A. This model not only retains the text analysis capabilities of Command A but also understands over 23 languages, making it a versatile asset for multinational businesses. Cohere emphasized that Command A Vision is designed to minimize total ownership costs for enterprises and is fully tailored for retrieval use cases. The model was developed using a Llava architecture, which converts visual features into manageable soft vision tokens. These tokens are then processed within the Command A text tower, which boasts a deep learning architecture with 111 billion parameters. The training process for Command A Vision involved three key stages: aligning vision and language, supervised fine-tuning (SFT), and post-training reinforcement learning with human feedback (RLHF). This methodology allows for a seamless integration of visual encoding into the language model's framework, enhancing the model's overall performance. In benchmark testing against notable competitors such as OpenAI’s GPT-4.1, Meta’s Llama 4 Maverick, and Mistral’s Pixtral models, Command A Vision demonstrated superior performance across various tasks, including Chart QA and OCRBench. It achieved an average score of 83.1%, surpassing the scores of its rivals, which ranged from 78.3% to 80.5%. As the demand for models capable of managing unstructured data increases with the rise of Deep Research technologies, Cohere's open weights approach for Command A Vision aims to attract enterprises seeking alternatives to proprietary models. Initial feedback from developers has been promising, with many expressing enthusiasm for the model's accuracy, particularly in extracting information from handwritten notes and other complex visual inputs.
A significant security breach has revealed that a hotel check-in system inadvertently exposed more than one million cust...
TechCrunch | May 15, 2026, 19:05
Jim Cramer, the host of CNBC's 'Mad Money,' expressed serious concerns regarding speculative behavior in the IPO market ...
CNBC | May 15, 2026, 23:15
An alarming Ebola outbreak has been confirmed in the Ituri province of the Democratic Republic of the Congo, as reported...
Ars Technica | May 15, 2026, 19:01
Recent disclosures from the U.S. Office of Government Ethics have unveiled that former President Donald Trump bought sha...
CNBC | May 15, 2026, 19:35
In a recent update, health officials from the World Health Organization (WHO) have announced a reduction in the reported...
Ars Technica | May 15, 2026, 21:35