The Ars Technica AI coding agent test:Minesweeperedition

The Ars Technica AI coding agent test:Minesweeperedition

The debate surrounding the use of artificial intelligence in programming continues to stir controversy. While some developers express skepticism due to AI coding agents making significant errors that necessitate extensive human intervention, others champion these tools for their potential to revolutionize coding practices. Advocates argue that advanced models are rapidly improving, addressing many of the issues seen in earlier iterations. To evaluate the capabilities of these modern AI coding tools, a test was conducted to recreate the classic Windows game, Minesweeper. Given the simplicity of coding a well-known game, the challenge was designed to include an additional twist: the AI was instructed to develop a fully-featured web version complete with sound effects and an unexpected gameplay element, alongside mobile touchscreen compatibility. The task was presented to four prominent AI coding agents: OpenAI’s Codex powered by GPT-5, Anthropic’s Claude Code with Opus 4.5, Google’s Gemini CLI, and Mistral Vibe. Each of these agents was tasked with directly editing HTML and scripting files on a local machine, while a supervising AI model parsed the prompt and delegated coding responsibilities to the various language models, which were equipped to execute the necessary coding actions. Importantly, the development process was kept private, with no insider access granted to the companies involved, ensuring an unbiased evaluation. After the completion of the projects, Ars Senior Gaming Editor Kyle Orland, who is well-versed in Minesweeper, assessed each entry without knowledge of which AI created which version. This evaluation focused on the raw output of the AI models, employing a "single shot" approach to gauge their performance without any human debugging. In practical scenarios, however, any complex AI-generated code typically undergoes some level of scrutiny and refinement by a human programmer to rectify issues and enhance efficiency.

Sources : Ars Technica

Published On : Dec 19, 2025, 17:30

Computing
Oracle Reassures Investors Amid Concerns Over Data Center Spending

Oracle has addressed investor worries regarding its aggressive spending on data centers, emphasizing its commitment to e...

Business Insider | Mar 11, 2026, 24:15
Oracle Reassures Investors Amid Concerns Over Data Center Spending
AI
Mira Murati’s Thinking Machines Joins Forces with Nvidia for Groundbreaking AI Infrastructure

Thinking Machines Lab, an innovative startup spearheaded by Mira Murati, the former CTO of OpenAI, has announced a signi...

Business Today | Mar 11, 2026, 02:55
Mira Murati’s Thinking Machines Joins Forces with Nvidia for Groundbreaking AI Infrastructure
Cybersecurity
Anduril Ventures into Space Defense with ExoAnalytic Acquisition

Anduril Industries has made headlines with its recent acquisition of ExoAnalytic Solutions, a firm specializing in missi...

CNBC | Mar 11, 2026, 04:15
Anduril Ventures into Space Defense with ExoAnalytic Acquisition
AI
Anthropic's New AI Code Reviewer Sparks Controversy Over Costs and Functionality

Anthropic has introduced a new feature for its AI model, Claude, aimed at reviewing code, but it's already facing backla...

Business Insider | Mar 11, 2026, 06:45
Anthropic's New AI Code Reviewer Sparks Controversy Over Costs and Functionality
Computing
Apple Launches MacBook Neo: A Game Changer for Young Indian Professionals

Apple Inc. is making a bold move in India with the introduction of the MacBook Neo, priced at Rs 69,900, aiming to trans...

Business Today | Mar 11, 2026, 05:00
Apple Launches MacBook Neo: A Game Changer for Young Indian Professionals
View All News