
In a landscape where artificial intelligence is rapidly evolving, doubts linger about its readiness to replace traditional knowledge workers such as lawyers, bankers, and IT professionals. Nearly two years after Microsoft CEO Satya Nadella suggested that AI could transform white-collar jobs, significant advancements in foundational models have yet to translate into tangible changes in the workplace. Recent research from Mercor, a leading name in training data, sheds light on this issue. The study introduces a benchmark known as Apex-Agents, which evaluates how well top AI models perform real-world tasks typically handled by professionals in fields like consulting, investment banking, and law. The findings reveal that these AI systems have struggled, with most models achieving a mere 25% accuracy when answering complex professional queries. Brendan Foody, a researcher involved in the study, pointed out that the primary challenge for these models lies in their ability to navigate information across various domains—an essential skill for effective knowledge work. “We designed the benchmark to reflect real-world professional environments where tasks involve multiple tools like Slack and Google Drive,” Foody explained. The scenarios used for the benchmark were curated from actual professionals, who helped formulate the questions and established criteria for what constitutes a successful answer. For instance, one complex legal question involved assessing whether certain actions aligned with EU privacy laws, a task that even knowledgeable humans might find daunting. If AI models could accurately respond to such inquiries, they might significantly impact the future of legal work. Foody emphasized the significance of this research, stating, “This benchmark highlights the essential tasks performed by these professionals and poses a critical question for the economy.” While other benchmarks like OpenAI’s GDPVal have sought to measure professional competence, Apex-Agents takes a more focused approach, assessing the ability to carry out specific high-stakes tasks in targeted professions. Among the models tested, Gemini 3 Flash led the pack with a 24% accuracy rate, followed closely by GPT-5.2 at 23%. Other models, including Opus 4.5 and GPT-5, registered around 18%. Although the initial results show room for improvement, the AI sector is known for rapidly advancing capabilities. Foody remains optimistic, noting the significant progress made over the past year. “We’re currently seeing performance akin to an intern with a 25% success rate, compared to just 5-10% last year. Such rapid improvement can lead to substantial changes in a short time,” he added, leaving the door open for future advancements in AI's workplace capabilities.
In response to ongoing criticisms that Facebook has become cluttered with low-quality AI-generated content, Meta unveile...
TechCrunch | Mar 13, 2026, 20:55
Travis Kalanick is reportedly embarking on a new venture focused on self-driving vehicles, with substantial support from...
TechCrunch | Mar 13, 2026, 19:10
At the recent SXSW conference, Spotify co-CEO Gustav Söderström unveiled an exciting new feature designed to give listen...
TechCrunch | Mar 13, 2026, 17:35
A recent survey by the Pew Research Council has unveiled a troubling trend among Americans regarding data centers. As th...
Business Insider | Mar 13, 2026, 18:35Beginning April 10, Amazon Prime members will see an increase in the cost of ad-free Prime Video, escalating from $3 to ...
Ars Technica | Mar 13, 2026, 17:20