The Ars Technica AI coding agent test:Minesweeperedition

The Ars Technica AI coding agent test:Minesweeperedition

The debate surrounding the use of artificial intelligence in programming continues to stir controversy. While some developers express skepticism due to AI coding agents making significant errors that necessitate extensive human intervention, others champion these tools for their potential to revolutionize coding practices. Advocates argue that advanced models are rapidly improving, addressing many of the issues seen in earlier iterations. To evaluate the capabilities of these modern AI coding tools, a test was conducted to recreate the classic Windows game, Minesweeper. Given the simplicity of coding a well-known game, the challenge was designed to include an additional twist: the AI was instructed to develop a fully-featured web version complete with sound effects and an unexpected gameplay element, alongside mobile touchscreen compatibility. The task was presented to four prominent AI coding agents: OpenAI’s Codex powered by GPT-5, Anthropic’s Claude Code with Opus 4.5, Google’s Gemini CLI, and Mistral Vibe. Each of these agents was tasked with directly editing HTML and scripting files on a local machine, while a supervising AI model parsed the prompt and delegated coding responsibilities to the various language models, which were equipped to execute the necessary coding actions. Importantly, the development process was kept private, with no insider access granted to the companies involved, ensuring an unbiased evaluation. After the completion of the projects, Ars Senior Gaming Editor Kyle Orland, who is well-versed in Minesweeper, assessed each entry without knowledge of which AI created which version. This evaluation focused on the raw output of the AI models, employing a "single shot" approach to gauge their performance without any human debugging. In practical scenarios, however, any complex AI-generated code typically undergoes some level of scrutiny and refinement by a human programmer to rectify issues and enhance efficiency.

Sources : Ars Technica

Published On : Dec 19, 2025, 17:30

AI
Anthropic Alleges Alibaba's Major Attempt to Illegally Access Claude AI

In a serious escalation of tensions, US-based AI firm Anthropic has accused Alibaba, a leading Chinese tech conglomerate...

Business Today | Jun 25, 2026, 05:00
Anthropic Alleges Alibaba's Major Attempt to Illegally Access Claude AI
AI
Unlocking Financial Independence: How AI Can Transform Your Earnings

For years, the conventional approach to building a career has been straightforward: secure a job, earn a paycheck, and c...

Business Today | Jun 25, 2026, 05:25
Unlocking Financial Independence: How AI Can Transform Your Earnings
Startups
New York Congressional Candidate Sends Bold Warning to AI Giants

In a decisive victory in New York's 12th congressional district, Micah Lasher emerged as the Democratic nominee, capturi...

Business Insider | Jun 25, 2026, 24:45
New York Congressional Candidate Sends Bold Warning to AI Giants
Startups
SK Hynix Shares Soar as Company Plans Major Nasdaq Listing

On Thursday, shares of the South Korean semiconductor powerhouse SK Hynix experienced a remarkable 11% increase followin...

CNBC | Jun 25, 2026, 24:45
SK Hynix Shares Soar as Company Plans Major Nasdaq Listing
Computing
China's LineShine Takes Top Spot in Global Supercomputing, Outpacing Nvidia's Dominance

In a significant shift in the global computing landscape, a new supercomputer from China has claimed the top position on...

Business Today | Jun 25, 2026, 08:10
China's LineShine Takes Top Spot in Global Supercomputing, Outpacing Nvidia's Dominance
View All News