Are bad incentives to blame for AI hallucinations?

Recent research from OpenAI delves into the persistent issue of hallucinations in large language models, including GPT-5 and chatbots like ChatGPT. These hallucinations are characterized as “plausible but incorrect statements generated by these models.” Despite advancements in technology, OpenAI acknowledges that hallucinations continue to pose a significant challenge for all large language models, a problem that is unlikely to be fully resolved. To highlight this issue, the researchers conducted a test using a popular chatbot, asking it about the title of Adam Tauman Kalai’s Ph.D. dissertation. The chatbot provided three different, incorrect answers. When they inquired about Kalai’s birthday, the model again generated three incorrect dates. This raises an important question: how can a chatbot present such confidently inaccurate information? The researchers propose that these hallucinations stem partly from the pretraining phase, which emphasizes the models’ ability to predict subsequent words without the benefit of true or false labels. The models are trained on fluent language examples, leading them to approximate general patterns. However, low-frequency facts, such as specific dates or uncommon knowledge, cannot be inferred from these patterns alone, resulting in hallucinations. Interestingly, the paper suggests that the root of the problem may lie not only in the training process but also in the evaluation methods used for large language models. The current evaluation frameworks do not directly cause hallucinations, but they create incentives that encourage guessing. The researchers likened these evaluations to multiple-choice tests, where random guessing can yield correct answers, but leaving a question blank results in a guaranteed zero. The solution proposed focuses on reforming the evaluation process. Instead of solely grading accuracy, which drives models to guess, the researchers advocate for a system that penalizes confident errors more heavily than uncertainty. This approach would be akin to standardized tests that incorporate negative marking for incorrect answers and offer partial credit for unanswered questions, discouraging blind speculation. The researchers emphasize that merely introducing a few uncertainty-aware tests is insufficient. The prevalent accuracy-based evaluations must be overhauled to deter guessing behavior. If the main evaluation metrics continue to reward random lucky guesses, the models will persist in learning to guess rather than express uncertainty when appropriate.

Sources : TechCrunch

Published On : Sep 08, 2025, 09:23

Gadgets

Revolutionizing Meeting Notes: The Rise of AI-Powered Notetakers

As remote work becomes the norm, the demand for efficient meeting documentation has surged. Enter a new generation of AI...

TechCrunch | Feb 02, 2026, 08:15

Revolutionizing Meeting Notes: The Rise of AI-Powered Notetakers

Reid Hoffman: Small Teams with AI Hold the Advantage Over Larger Groups

The concept of the 'tiny team' has gained traction, as highlighted by Reid Hoffman, co-founder of LinkedIn. In a recent ...

Business Insider | Feb 02, 2026, 09:10

Reid Hoffman: Small Teams with AI Hold the Advantage Over Larger Groups

Cryptocurrency

Bitcoin Faces Significant Decline Amid Broader Market Turmoil

Bitcoin continued its downward trend on Monday, marking a significant drop as the leading cryptocurrency slipped below t...

CNBC | Feb 02, 2026, 10:55

Bitcoin Faces Significant Decline Amid Broader Market Turmoil

Startups

Delaware's Corporate Appeal Unshaken Amid Calls for 'Dexit'

Despite high-profile calls for corporations to leave Delaware, the state continues to thrive as a leading hub for busine...

Business Insider | Feb 02, 2026, 10:20

Delaware's Corporate Appeal Unshaken Amid Calls for 'Dexit'

Computing

Nvidia's Stock Dips Amid Uncertainty Over OpenAI Investment Plans

Nvidia's shares experienced a downturn in premarket trading on Monday, dropping by 1.8% following reports that the compa...

CNBC | Feb 02, 2026, 12:00

Nvidia's Stock Dips Amid Uncertainty Over OpenAI Investment Plans

View All News

High-quality, Cost-effective IT Outsourcing

let’s grow together!

portfolio

case study

follow us on

follow us on

Are bad incentives to blame for AI hallucinations?

Revolutionizing Meeting Notes: The Rise of AI-Powered Notetakers

Reid Hoffman: Small Teams with AI Hold the Advantage Over Larger Groups

Bitcoin Faces Significant Decline Amid Broader Market Turmoil

Delaware's Corporate Appeal Unshaken Amid Calls for 'Dexit'

Nvidia's Stock Dips Amid Uncertainty Over OpenAI Investment Plans

Collaborate with Benzatine Infotech

High-quality, Cost-effective IT Outsourcing

let’s grow together!

portfolios

case study

follow us on

follow us on

portfolio

case study

follow us on

follow us on

Are bad incentives to blame for AI hallucinations?

Revolutionizing Meeting Notes: The Rise of AI-Powered Notetakers

Reid Hoffman: Small Teams with AI Hold the Advantage Over Larger Groups

Bitcoin Faces Significant Decline Amid Broader Market Turmoil

Delaware's Corporate Appeal Unshaken Amid Calls for 'Dexit'

Nvidia's Stock Dips Amid Uncertainty Over OpenAI Investment Plans

Collaborate with Benzatine Infotech