
On Thursday, OpenAI unveiled a new benchmark that gauges the performance of its AI models in comparison to human professionals across various industries. This benchmark, named GDPval, represents an initial effort to assess how close OpenAI's systems are to potentially surpassing human capabilities in economically significant roles, a fundamental aspect of the company’s goal to develop artificial general intelligence (AGI). The findings indicate that OpenAI's GPT-5 model, alongside Anthropic's Claude Opus 4.1, is nearing the quality of work typically produced by industry experts. However, OpenAI cautions against immediate job displacement fears, as the GDPval benchmark currently covers only a limited range of tasks performed by professionals. Despite some CEOs predicting rapid AI job replacements, OpenAI acknowledges that the scope of GDPval is narrow. GDPval focuses on nine industries that significantly contribute to the U.S. gross domestic product, including healthcare, finance, manufacturing, and government. The benchmark evaluates AI performance across 44 occupations within these sectors, which range from software engineers to nurses and journalists. In its first iteration, GDPval-v0, OpenAI tasked seasoned professionals with comparing AI-generated reports to those created by their peers and selecting the superior submission. For instance, one prompt required investment bankers to develop a competitive analysis for the last mile delivery sector and compare their findings to those produced by AI. The model's “win rate” was then averaged across all 44 professions. For the enhanced GPT-5-high version, OpenAI reported that the model was rated as better or equivalent to industry experts 40.6% of the time. In contrast, Anthropic’s Claude Opus 4.1 achieved a higher success rate of 49%, attributed in part to its ability to produce visually appealing graphics. It is important to recognize that most professionals engage in a variety of tasks beyond just submitting research reports, which the current GDPval-v0 benchmark evaluates. OpenAI acknowledges this limitation and plans to enhance future tests to encompass a broader range of industries and interactive workflows. Despite these constraints, the company views the progress reflected in GDPval as significant. In a discussion with TechCrunch, OpenAI's chief economist, Dr. Aaron Chatterji, stated that the results imply that professionals can increasingly leverage AI models for more meaningful work. He noted, "As the model improves, people can utilize it to offload certain responsibilities and focus on higher-value tasks." Tejal Patwardhan, who leads OpenAI's evaluations, expressed optimism regarding the advancement demonstrated in GDPval. The previous GPT-4o model scored only 13.7% in terms of wins and ties against human counterparts, a figure released about 15 months ago. Now, with GPT-5 achieving nearly triple that score, Patwardhan anticipates this upward trend to continue. As various benchmarks exist in Silicon Valley to evaluate AI models' progress, GDPval could play a crucial role in discussions about AI's real-world task proficiency. While other popular benchmarks like AIME 2025 and GPQA Diamond are widely used, many AI researchers have called for improved evaluations. OpenAI may ultimately need a more extensive version of GDPval to convincingly demonstrate that its AI models can outperform human professionals.
Apple enthusiasts are in for a treat as the tech giant gears up for a transformative week of hardware announcements star...
Ars Technica | Feb 27, 2026, 13:40
A colleague recently acquired the latest M4 MacBook Air, and while the hardware impresses, there are some concerns regar...
Ars Technica | Feb 27, 2026, 14:45
After a lengthy battle, Google has secured conditional permission to export high-precision geographic data from South Ko...
TechCrunch | Feb 27, 2026, 13:40
OpenAI has successfully concluded a monumental funding round, raising a staggering $110 billion—more than double the amo...
CNBC | Feb 27, 2026, 13:50
The memory market is undergoing a seismic shift as artificial intelligence continues to gain traction across various ind...
CNN | Feb 27, 2026, 13:25