Recent assessments reveal that AI agents are still struggling to match the performance of human consultants in various real-world tasks. Conducted by Mercor, a prominent player in AI training, the research aimed to evaluate leading AI models in consulting, banking, and legal scenarios. Despite notable advancements, the AI agents succeeded in completing less than 25% of the assigned tasks on their first attempt, with an overall success rate of only 40% after multiple tries. Brendan Foody, Mercor's CEO, emphasizes that these preliminary results are only part of the broader picture. The benchmark, known as APEX-Agents, was crafted to mirror actual management consulting tasks, drawing input from experts at major firms like McKinsey and Deloitte. In this context, OpenAI's GPT 5.2 led the pack by accomplishing nearly 23% of tasks on its initial attempt, while Anthropic's newly released Opus 4.6 improved to nearly 33%. The evolution of these models is significant; for instance, GPT 3 had a mere 3% success rate in similar tasks just months ago. Foody anticipates that with continued enhancements, success rates could approach 50% by year-end. He notes that these AI models are making strides in handling complex tasks typically valued at millions of dollars by consulting firms. AI's influence is already reshaping the consulting landscape, evidenced by McKinsey's employment of 25,000 AI agents among its 60,000 workforce, allowing for unprecedented growth without increasing headcount. However, the research indicates that while AI agents excel in research and data analysis, they falter when tasks become more complex or time-consuming. They struggle to navigate file systems and manage multi-step processes effectively, leading to inaccuracies in their outputs. Foody likens the current performance of AI agents to that of interns, suggesting they achieve a 50% pass rate but still require substantial oversight. Insights from Frank Jones, a former consultant now consulting for Mercor, highlight the necessity of precise prompts for AI to meet consulting standards, as they often miss nuanced expectations. Looking ahead, Foody believes that enhancing AI models hinges not on groundbreaking innovations but on improved training methodologies. Mercor, which has attracted significant investment, is positioned as a major player in the AI landscape, aiming to refine these agents further. The next iteration of the benchmarking tool will assess the entire ecosystem of professional services, potentially revealing even more alarming implications for traditional consulting roles. Foody predicts that in the near future, AI chatbots could rival the capabilities of leading consulting firms.
AnduraX, an innovative space startup from Andhra Pradesh, is gearing up to perform a groundbreaking high-altitude drop t...
Business Today | May 26, 2026, 06:40
In Munich, Germany, the spotlight is on Porsche as it unveils the new Cayenne Turbo Coupe, an electric SUV that redefine...
Ars Technica | May 26, 2026, 12:05
Apple is known for its premium MacBook Pro series, which has consistently set high standards in the industry. However, w...
Business Today | May 26, 2026, 08:30
In a significant development for e-commerce logistics, Stord has announced it has successfully raised $250 million, boos...
TechCrunch | May 26, 2026, 12:35
Good morning! As we dive into a new trading day, there are several important updates for investors to keep on their rada...
CNBC | May 26, 2026, 12:25