AI models are starting to crack high-level math problems

AI models are starting to crack high-level math problems

During the recent weekend, Neel Somani, a software engineer and former quantitative researcher, made an astonishing discovery while testing OpenAI's latest model. After inputting a high-level math problem into ChatGPT and allowing it to process for 15 minutes, he returned to find a complete solution. Upon reviewing the proof using a tool called Harmonic, he was pleasantly surprised to confirm its accuracy. Somani’s intent was to determine the capabilities of large language models (LLMs) in solving open mathematical problems and where they might struggle. To his amazement, the new model seemed to push the boundaries further than anticipated. ChatGPT demonstrated a remarkable ability to articulate mathematical concepts, referencing axioms such as Legendre’s formula, Bertrand’s postulate, and even the Star of David theorem. In an intriguing twist, the model unearthed a 2013 Math Overflow post by Harvard mathematician Noam Elkies, which offered an elegant solution to a similar challenge. However, ChatGPT's proof diverged significantly from Elkies' work, presenting a more comprehensive resolution to a question posed by the acclaimed mathematician Paul Erdős, known for his extensive collection of unsolved problems that have become a testing ground for AI. For those skeptical of machine intelligence, these findings are quite remarkable. AI applications in mathematics are proliferating, with tools ranging from formalization-focused LLMs like Harmonic’s Aristotle to comprehensive literature review systems like OpenAI’s deep research. Since the introduction of GPT 5.2, which Somani describes as “anecdotally superior in mathematical reasoning,” the number of problems solved has surged, prompting new discussions about the potential of LLMs to expand the horizons of human knowledge. Somani focused on Erdős' problems, which encompass over a thousand conjectures available online, presenting a compelling challenge for AI-driven mathematics. The first significant autonomous solutions emerged last November from a Gemini-powered model known as AlphaEvolve. However, it is GPT 5.2 that has shown exceptional proficiency with high-level math in recent evaluations. Since Christmas, 15 of these problems have transitioned from “open” to “solved,” with AI models credited for 11 of these solutions. Prominent mathematician Terence Tao has provided a detailed analysis of this progress on his GitHub page, noting eight specific instances where AI models made substantial autonomous advancements on Erdős problems, along with six cases that involved leveraging previous research. While AI is not yet capable of executing math independently, it is clear that large models have a significant role to play. Tao proposed on Mastodon that the scalable nature of AI makes it ideally suited for addressing the more obscure Erdős problems, many of which could have straightforward solutions. He emphasized that these simpler problems are now more likely to be solved using purely AI-driven approaches rather than human involvement or hybrid methods. Another crucial factor is the growing emphasis on formalization, a meticulous process that simplifies mathematical reasoning and enhances verification. Although formalization itself does not necessitate AI or computers, recent advancements in automated tools have simplified this endeavor. The open-source proof assistant Lean, developed by Microsoft Research in 2013, has gained traction in the field for formalizing proofs, while AI tools like Harmonic’s Aristotle aim to streamline much of the formalization workload. According to Tudor Achim, founder of Harmonic, the uptick in solved Erdős problems is less notable than the increasing acceptance of these tools by the mathematics community. "What matters more is that respected math and computer science professors are utilizing AI tools,” Achim commented. “These individuals have their reputations at stake; their endorsement of tools like Aristotle or ChatGPT serves as significant validation.”

Sources : TechCrunch

Published On : Jan 14, 2026, 19:30

Science
Vaccination Rates Soar in New Mexico Amid Measles Outbreak

In January 2025, a measles outbreak emerged on the outskirts of Texas, quickly spreading to New Mexico and other neighbo...

Ars Technica | Mar 13, 2026, 15:45
Vaccination Rates Soar in New Mexico Amid Measles Outbreak
Computing
Growing Concerns: Americans Increasingly Skeptical of Data Centers' Impact

A recent survey by the Pew Research Council has unveiled a troubling trend among Americans regarding data centers. As th...

Business Insider | Mar 13, 2026, 18:35
Growing Concerns: Americans Increasingly Skeptical of Data Centers' Impact
Streaming
Amazon Unveils Price Increase for Ad-Free Prime Video with New Features

Amazon has announced a $2 increase in the monthly fee for its ad-free Prime Video service in the U.S., raising it from $...

CNBC | Mar 13, 2026, 16:35
Amazon Unveils Price Increase for Ad-Free Prime Video with New Features
Automotive
Revolutionizing Electric Vehicles: The Impact of 800V Architecture

For years, the majority of electric vehicles (EVs) have relied on a standard battery pack operating at approximately 400...

Ars Technica | Mar 13, 2026, 18:35
Revolutionizing Electric Vehicles: The Impact of 800V Architecture
Gaming
Microsoft Aims to Revolutionize PC Gaming with Precompiled Shader Technology

For many gamers, the experience of starting a new game is often marred by frustrating wait times due to the 'compiling s...

Ars Technica | Mar 13, 2026, 15:35
Microsoft Aims to Revolutionize PC Gaming with Precompiled Shader Technology
View All News