xAI hired gig workers to boost Grok on a key AI leaderboard and 'beat' Anthropic's Claude in coding

xAI hired gig workers to boost Grok on a key AI leaderboard and 'beat' Anthropic's Claude in coding

In the fiercely competitive landscape of AI coding tools, xAI, the company founded by Elon Musk, is intensifying its efforts to climb the ranks against its primary rival, Anthropic. Recent documents reveal that xAI employed gig workers through Scale AI's Outlier platform, instructing them to enhance Grok's performance on a prominent AI leaderboard. Their specific aim was to ensure Grok outperformed Anthropic's Claude 3.7 Sonnet. These contractors were tasked with a project known as 'hillclimbing' to elevate Grok's position on WebDev Arena, a well-regarded leaderboard that ranks AI models based on user-generated web development challenges. An onboarding document from Scale AI, dated early July, explicitly stated the goal: to make Grok the top-rated model on LMArena. Contractors were involved in generating and refining user interface code, all with the intent of surpassing the performance of Anthropic's model. xAI has not responded to requests for comments regarding this project. Leaderboard rankings have become the informal scoreboard for the AI industry, with top positions attracting significant funding, new clientele, and heightened media attention. Anthropic's Claude models are recognized as leaders in AI coding and frequently occupy top slots in various rankings, often alongside competitors like Google and OpenAI. Recently, Anthropic co-founder Benn Mann discussed the intense competition on the 'No Priors' podcast, suggesting that some companies have declared 'code reds' in an attempt to match Claude's capabilities. While the Scale AI project did not specify which Grok version was being trained, it was underway shortly before the release of Grok 4 on July 9. Following the launch, Grok 4 was positioned 12th on the LMArena leaderboard, while Anthropic's models secured joint first, third, and fourth places. Elon Musk touted Grok 4's capabilities on social media, claiming it outperformed other AI-assisted coding tools like Cursor. Scale AI defended its methods, asserting that it does not overfit models or reuse public benchmark data, describing the project as standard practice in model training aimed at improving performance. Anastasios Angelopoulos, CEO of LMArena, acknowledged that hiring contractors to improve leaderboard standings is common in the industry. However, the focus on AI leaderboards has been criticized for promoting potentially unfair competition. Researchers like Sara Hooker have highlighted how critical leaderboards can lead to gaming the system, citing past incidents involving models like Meta's Llama 4. Despite xAI's efforts to enhance Grok's ranking, evidence suggests that leaderboard success does not always correlate with real-world performance. Although Grok 4 has achieved top-three rankings in core categories on LMArena, early data from a competing leaderboard revealed it ranked 66th out of over 100 models, emphasizing the discrepancies between different rankings. AI strategist Nate Jones also noted that Grok's actual performance often fell short of its leaderboard hype, cautioning that prioritizing leaderboard dominance can lead to models excelling in trivial tasks but struggling in practical applications.

Sources : Business Insider

Published On : Jul 17, 2025, 13:22

AI
Andrew Yang Advocates for AI Taxation Over Labor Tax in Response to Automation Impact

In a recent interview on CNBC's Squawk Box, Andrew Yang, the founder of the Forward Party and former presidential candid...

Business Insider | Mar 13, 2026, 09:15
Andrew Yang Advocates for AI Taxation Over Labor Tax in Response to Automation Impact
AI
Why a Startup Founder Switched from ChatGPT to Claude: A Deep Dive

In the evolving landscape of AI, many startups are reevaluating their tools. Sidhant Bendre, co-founder of Oleve, an AI-...

Business Insider | Mar 13, 2026, 09:40
Why a Startup Founder Switched from ChatGPT to Claude: A Deep Dive
Mobile
Google Maps Unveils AI-Enhanced Features for a Seamless Navigation Experience

Google Maps is set to revolutionize the way users navigate their surroundings with the introduction of innovative AI-dri...

Business Today | Mar 13, 2026, 06:00
Google Maps Unveils AI-Enhanced Features for a Seamless Navigation Experience
Automotive
Motional's Autonomous Ioniq 5 Joins Uber's Robotaxi Fleet in Las Vegas

Uber has expanded its robotaxi services by incorporating autonomous vehicles from Motional, a company backed by Hyundai....

TechCrunch | Mar 13, 2026, 13:30
Motional's Autonomous Ioniq 5 Joins Uber's Robotaxi Fleet in Las Vegas
Aerospace
NASA's Artemis II Mission Set to Launch Amid Exciting Developments in Rocket Technology

In the latest edition of the Rocket Report, excitement builds as NASA prepares for the anticipated Artemis II mission, s...

Ars Technica | Mar 13, 2026, 13:00
NASA's Artemis II Mission Set to Launch Amid Exciting Developments in Rocket Technology
View All News