In the ongoing race for supremacy in the AI sector, companies like Google and Anthropic are not only innovating but also engaging in light-hearted competition through the classic Pokémon games. A recent report from Google DeepMind reveals that their AI model, Gemini 2.5 Pro, experiences moments of 'panic' when its Pokémon are on the brink of defeat. This reaction leads to a noticeable drop in the AI's reasoning skills, as noted in the findings. Benchmarking AI performance can often be subjective, providing limited insight into the true capabilities of various models. However, some researchers believe that observing how these AI systems tackle video games can offer both entertainment and valuable data. Over recent months, developers have launched Twitch streams titled "Gemini Plays Pokémon" and "Claude Plays Pokémon," allowing viewers to watch in real-time as these AIs navigate a game that has captivated players for over 25 years. Each stream reveals the reasoning process of the AI, translating its problem-solving approach into natural language, thus shedding light on their operational mechanics. Despite their astonishing advancements, these AI models still struggle with gameplay efficiency. Gemini, for instance, requires hundreds of hours to work through scenarios that a child could easily finish in a fraction of the time. The real intrigue lies not in how quickly they complete the game, but in how they respond to various challenges. The report indicates that during gameplay, Gemini 2.5 Pro encounters situations that trigger its simulated 'panic,' leading to a decline in its performance as it may neglect to utilize available tools effectively. This behavior, while not indicative of actual thought or emotion, mirrors how humans might make rash decisions when under pressure — a captivating yet slightly alarming phenomenon. Viewers have noticed this pattern during Twitch streams, where chat participants actively comment on the AI's performance. Claude, another AI model, has also demonstrated peculiar behaviors. In one instance, it learned that when all its Pokémon faint, the player character is transported back to the last visited Pokémon Center. However, when Claude found itself trapped in Mt. Moon cave, it mistakenly believed that purposely letting its Pokémon faint would teleport it to the next town's Pokémon Center, leading to a dramatic and unintended gameplay outcome. While the AI has its flaws, it still showcases strengths in certain areas. Notably, Gemini 2.5 Pro excels at puzzle-solving, demonstrating impressive accuracy with human guidance. It has developed specific tools to tackle complex boulder puzzles in the game, achieving remarkable results after only minimal prompts regarding boulder physics. Google suggests that this model might eventually be able to create such tools independently, hinting at a future where AI could potentially learn to manage its own 'panic' responses.
In a move to align with the European Union's Digital Markets Act (DMA), Apple has unveiled significant updates to its de...
TechCrunch | Jun 26, 2025, 21:30Google has introduced an innovative new app named Doppl, designed to help users visualize how various outfits would look...
TechCrunch | Jun 26, 2025, 21:30The TechCrunch All Stage summit is poised to be a monumental gathering for entrepreneurs and investors, bringing togethe...
TechCrunch | Jun 26, 2025, 22:50The highly anticipated TechCrunch All Stage event, catering to startup founders across various funding phases, is set to...
TechCrunch | Jun 26, 2025, 21:30Nestled between two towering structures in the Nevada desert, a collection of 805 retired electric vehicle (EV) batterie...
TechCrunch | Jun 27, 2025, 03:50