Google researchers find the best AI model is 69% right

Google researchers find the best AI model is 69% right

Recent developments from Google DeepMind have shed light on the reliability of artificial intelligence in providing accurate information. The introduction of the FACTS Benchmark Suite aims to evaluate how effectively AI models deliver factually correct responses. This suite assesses models across four critical areas: answering straightforward questions, utilizing web searches, grounding responses in lengthy documents, and analyzing images. The standout performer in this evaluation, Google's Gemini 3 Pro, achieved an accuracy rate of 69%. In comparison, many other leading models fell significantly short. To put this into perspective, if any journalist under my supervision submitted articles with a 69% accuracy rate, their position would be in jeopardy. This statistic is particularly significant for businesses that rely on AI technology. While these models are impressive in terms of speed and language fluency, their factual accuracy remains a concern, especially when it comes to complex reasoning or niche knowledge. In critical sectors like finance, healthcare, and legal fields, even minor inaccuracies can lead to serious repercussions. A recent analysis by my colleague Melia Russell highlighted the challenges law firms face with the integration of AI as a credible source of legal information. One notable case involved a firm dismissing an employee who submitted a document filled with fabricated legal cases after using ChatGPT for assistance. The FACTS Benchmark serves as both a cautionary tale and a guide for future improvements, as Google aims to identify and address the shortcomings of these AI models. The primary takeaway remains clear: while AI is making strides, it still falls short of human-level accuracy, being incorrect nearly a third of the time.

Sources : Business Insider

Published On : Dec 12, 2025, 21:30

Automotive
Tesla's Austin Factory Sees Workforce Cut Amid Declining Sales

Tesla's manufacturing hub near Austin, Texas, has experienced a significant reduction in its workforce, with numbers plu...

TechCrunch | Apr 03, 2026, 21:00
Tesla's Austin Factory Sees Workforce Cut Amid Declining Sales
Computing
Tech Job Market Defies Expectations with Surge in Software Openings

Recent job market data reveals a surprising uptick in tech employment, particularly within the software engineering sect...

Business Insider | Apr 03, 2026, 19:20
Tech Job Market Defies Expectations with Surge in Software Openings
AI
Meta's Ambitious Leap into AI Hardware: A New Team Takes Shape

Meta is significantly enhancing its superintelligence division by assembling a specialized hardware team, signaling a bo...

Business Insider | Apr 03, 2026, 19:10
Meta's Ambitious Leap into AI Hardware: A New Team Takes Shape
AI
Trump's Ambitious AI Data Center Plans Stalled by Trade Policies

Donald Trump is encountering major setbacks in his quest to rapidly expand AI data centers across the United States, a k...

Ars Technica | Apr 03, 2026, 20:50
Trump's Ambitious AI Data Center Plans Stalled by Trade Policies
AI
Musk Mandates Grok Subscriptions for SpaceX IPO Collaborators

In a bold move, Elon Musk is requiring banks and consulting firms involved in SpaceX’s upcoming initial public offering ...

Ars Technica | Apr 03, 2026, 21:20
Musk Mandates Grok Subscriptions for SpaceX IPO Collaborators
View All News