Gemini Flash 3 gets ‘Agentic Vision’ for smarter image responses

On January 28, Google announced an innovative feature for Gemini Flash 3 known as "Agentic Vision," which revolutionizes the way image processing is conducted by transitioning from a passive observation to an active exploration. According to a blog post from Google, this groundbreaking method integrates visual reasoning with the execution of automated code, creating a dynamic "Think, Act, Observe" framework for visual analysis. Google asserts that this new approach significantly minimizes the occurrence of hallucinations in image processing, leading to more precise outcomes for visual tasks. The tech giant explained that the model is now capable of formulating step-by-step plans to zoom in, inspect, and manipulate images, grounding its responses in tangible visual evidence. One of the standout features of Agentic Vision is its real-time image annotation capability. Unlike traditional methods that merely describe scenes, this model can act as an agent, executing Python code to visualize its findings. This shift from random guessing to a reliable, code-based execution claims to enhance quality by 5-10%. Google remarked, "Standard LLMs frequently hallucinate during complex visual arithmetic, but Gemini Flash circumvents this by delegating calculations to a deterministic Python environment." The company is making a strategic shift from models that simply observe to those that actively investigate. Google provided several real-world applications to illustrate the effectiveness of Agentic Vision. For instance, PlanCheckSolver.com, an AI-driven platform for building plan validation, reported a 5% improvement in accuracy by utilizing Gemini Flash to iteratively assess high-resolution images. In another example, when tasked with counting fingers in the Gemini app, the model effectively employs Python to create bounding boxes and labels for each finger, mitigating counting errors. Currently, developers can access the Agentic Vision feature through the Gemini API within Google AI Studio and Vertex AI in the Gemini app. Looking ahead, Google has ambitious plans to enhance Agentic Vision further. The company aims to equip the model to make autonomous decisions regarding rotating, zooming, or executing visual calculations without additional prompts. Additionally, there are plans to integrate features such as web and reverse image search and to extend Agentic Vision to more robust models beyond Flash.

Sources : Business Today

Published On : Jan 28, 2026, 08:05

Exploring the Inner Workings of an AI-Powered Data Center

Recently, we had the opportunity to explore the intricate operations of a data center specifically designed for artifici...

CNBC | Apr 29, 2026, 17:45

Exploring the Inner Workings of an AI-Powered Data Center

Computing

Middle East Data Center Expansion Stalled Amid Conflict Escalation

In a significant shift, a prominent data center developer has halted its investment projects across the Middle East foll...

Ars Technica | Apr 29, 2026, 17:30

Middle East Data Center Expansion Stalled Amid Conflict Escalation

Streaming

Google Surges with 25 Million New Subscribers, Driven by YouTube and Google One

In a remarkable financial performance, Google has reported an impressive addition of 25 million paid subscriptions to it...

TechCrunch | Apr 29, 2026, 21:15

OpenAI Struggles to Curb GPT 5.5's Goblin Obsession

OpenAI's latest coding model, GPT 5.5, has sparked a wave of amusement and confusion as it seems to have developed an un...

Business Insider | Apr 29, 2026, 17:35

OpenAI Struggles to Curb GPT 5.5's Goblin Obsession

Cloud Computing

Amazon Web Services Surpasses Expectations with 28% Revenue Surge

Amazon Web Services (AWS) has reported an impressive 28% increase in revenue for the first quarter, exceeding analysts' ...

CNBC | Apr 29, 2026, 21:05

Amazon Web Services Surpasses Expectations with 28% Revenue Surge

View All News

High-quality, Cost-effective IT Outsourcing

let’s grow together!

portfolio

case study

follow us on

follow us on

Gemini Flash 3 gets ‘Agentic Vision’ for smarter image responses

Exploring the Inner Workings of an AI-Powered Data Center

Middle East Data Center Expansion Stalled Amid Conflict Escalation

Google Surges with 25 Million New Subscribers, Driven by YouTube and Google One

OpenAI Struggles to Curb GPT 5.5's Goblin Obsession

Amazon Web Services Surpasses Expectations with 28% Revenue Surge

Collaborate with Benzatine Infotech

High-quality, Cost-effective IT Outsourcing

let’s grow together!

portfolios

case study

follow us on

follow us on

portfolio

case study

follow us on

follow us on

Gemini Flash 3 gets ‘Agentic Vision’ for smarter image responses

Exploring the Inner Workings of an AI-Powered Data Center

Middle East Data Center Expansion Stalled Amid Conflict Escalation

Google Surges with 25 Million New Subscribers, Driven by YouTube and Google One

OpenAI Struggles to Curb GPT 5.5's Goblin Obsession

Amazon Web Services Surpasses Expectations with 28% Revenue Surge

Collaborate with Benzatine Infotech