OpenAI says GPT-5 stacks up to humans in a wide range of jobs

OpenAI says GPT-5 stacks up to humans in a wide range of jobs

On Thursday, OpenAI unveiled a new benchmark that gauges the performance of its AI models in comparison to human professionals across various industries. This benchmark, named GDPval, represents an initial effort to assess how close OpenAI's systems are to potentially surpassing human capabilities in economically significant roles, a fundamental aspect of the company’s goal to develop artificial general intelligence (AGI). The findings indicate that OpenAI's GPT-5 model, alongside Anthropic's Claude Opus 4.1, is nearing the quality of work typically produced by industry experts. However, OpenAI cautions against immediate job displacement fears, as the GDPval benchmark currently covers only a limited range of tasks performed by professionals. Despite some CEOs predicting rapid AI job replacements, OpenAI acknowledges that the scope of GDPval is narrow. GDPval focuses on nine industries that significantly contribute to the U.S. gross domestic product, including healthcare, finance, manufacturing, and government. The benchmark evaluates AI performance across 44 occupations within these sectors, which range from software engineers to nurses and journalists. In its first iteration, GDPval-v0, OpenAI tasked seasoned professionals with comparing AI-generated reports to those created by their peers and selecting the superior submission. For instance, one prompt required investment bankers to develop a competitive analysis for the last mile delivery sector and compare their findings to those produced by AI. The model's “win rate” was then averaged across all 44 professions. For the enhanced GPT-5-high version, OpenAI reported that the model was rated as better or equivalent to industry experts 40.6% of the time. In contrast, Anthropic’s Claude Opus 4.1 achieved a higher success rate of 49%, attributed in part to its ability to produce visually appealing graphics. It is important to recognize that most professionals engage in a variety of tasks beyond just submitting research reports, which the current GDPval-v0 benchmark evaluates. OpenAI acknowledges this limitation and plans to enhance future tests to encompass a broader range of industries and interactive workflows. Despite these constraints, the company views the progress reflected in GDPval as significant. In a discussion with TechCrunch, OpenAI's chief economist, Dr. Aaron Chatterji, stated that the results imply that professionals can increasingly leverage AI models for more meaningful work. He noted, "As the model improves, people can utilize it to offload certain responsibilities and focus on higher-value tasks." Tejal Patwardhan, who leads OpenAI's evaluations, expressed optimism regarding the advancement demonstrated in GDPval. The previous GPT-4o model scored only 13.7% in terms of wins and ties against human counterparts, a figure released about 15 months ago. Now, with GPT-5 achieving nearly triple that score, Patwardhan anticipates this upward trend to continue. As various benchmarks exist in Silicon Valley to evaluate AI models' progress, GDPval could play a crucial role in discussions about AI's real-world task proficiency. While other popular benchmarks like AIME 2025 and GPQA Diamond are widely used, many AI researchers have called for improved evaluations. OpenAI may ultimately need a more extensive version of GDPval to convincingly demonstrate that its AI models can outperform human professionals.

Sources : TechCrunch

Published On : Sep 25, 2025, 16:41

Gadgets
Apple Set to Unveil Exciting New Devices Next Week

Apple enthusiasts are in for a treat as the tech giant gears up for a transformative week of hardware announcements star...

Ars Technica | Feb 27, 2026, 13:40
Apple Set to Unveil Exciting New Devices Next Week
Computing
Navigating the Downgrade: A Guide for New macOS Tahoe Users

A colleague recently acquired the latest M4 MacBook Air, and while the hardware impresses, there are some concerns regar...

Ars Technica | Feb 27, 2026, 14:45
Navigating the Downgrade: A Guide for New macOS Tahoe Users
Gadgets
Google Maps Set for Major Launch in South Korea Following Conditional Approval

After a lengthy battle, Google has secured conditional permission to export high-precision geographic data from South Ko...

TechCrunch | Feb 27, 2026, 13:40
Google Maps Set for Major Launch in South Korea Following Conditional Approval
AI
OpenAI Secures Historic $110 Billion Investment Boosting Valuation to $730 Billion

OpenAI has successfully concluded a monumental funding round, raising a staggering $110 billion—more than double the amo...

CNBC | Feb 27, 2026, 13:50
OpenAI Secures Historic $110 Billion Investment Boosting Valuation to $730 Billion
Computing
The RAM Revolution: How AI is Reshaping the Memory Landscape

The memory market is undergoing a seismic shift as artificial intelligence continues to gain traction across various ind...

CNN | Feb 27, 2026, 13:25
The RAM Revolution: How AI is Reshaping the Memory Landscape
View All News