
On January 28, Google announced an innovative feature for Gemini Flash 3 known as "Agentic Vision," which revolutionizes the way image processing is conducted by transitioning from a passive observation to an active exploration. According to a blog post from Google, this groundbreaking method integrates visual reasoning with the execution of automated code, creating a dynamic "Think, Act, Observe" framework for visual analysis. Google asserts that this new approach significantly minimizes the occurrence of hallucinations in image processing, leading to more precise outcomes for visual tasks. The tech giant explained that the model is now capable of formulating step-by-step plans to zoom in, inspect, and manipulate images, grounding its responses in tangible visual evidence. One of the standout features of Agentic Vision is its real-time image annotation capability. Unlike traditional methods that merely describe scenes, this model can act as an agent, executing Python code to visualize its findings. This shift from random guessing to a reliable, code-based execution claims to enhance quality by 5-10%. Google remarked, "Standard LLMs frequently hallucinate during complex visual arithmetic, but Gemini Flash circumvents this by delegating calculations to a deterministic Python environment." The company is making a strategic shift from models that simply observe to those that actively investigate. Google provided several real-world applications to illustrate the effectiveness of Agentic Vision. For instance, PlanCheckSolver.com, an AI-driven platform for building plan validation, reported a 5% improvement in accuracy by utilizing Gemini Flash to iteratively assess high-resolution images. In another example, when tasked with counting fingers in the Gemini app, the model effectively employs Python to create bounding boxes and labels for each finger, mitigating counting errors. Currently, developers can access the Agentic Vision feature through the Gemini API within Google AI Studio and Vertex AI in the Gemini app. Looking ahead, Google has ambitious plans to enhance Agentic Vision further. The company aims to equip the model to make autonomous decisions regarding rotating, zooming, or executing visual calculations without additional prompts. Additionally, there are plans to integrate features such as web and reverse image search and to extend Agentic Vision to more robust models beyond Flash.
Grammarly has recently unveiled a contentious new feature that employs artificial intelligence to replicate editorial fe...
TechCrunch | Mar 12, 2026, 17:00
Lucid Motors is setting its sights on the bustling midsize SUV market, a move that could prove pivotal for the company's...
Ars Technica | Mar 12, 2026, 17:55
The wave of departures from Elon Musk's AI startup, xAI, continues as cofounder Zihang Dai has left the company this wee...
Business Insider | Mar 12, 2026, 16:25In a bid to re-engage users and attract a younger audience, Tinder unveiled a series of exciting updates during its firs...
TechCrunch | Mar 12, 2026, 18:40
In a significant global operation, law enforcement agencies have successfully dismantled a massive botnet consisting of ...
TechCrunch | Mar 12, 2026, 17:00