AI models are starting to crack high-level math problems

AI models are starting to crack high-level math problems

During the recent weekend, Neel Somani, a software engineer and former quantitative researcher, made an astonishing discovery while testing OpenAI's latest model. After inputting a high-level math problem into ChatGPT and allowing it to process for 15 minutes, he returned to find a complete solution. Upon reviewing the proof using a tool called Harmonic, he was pleasantly surprised to confirm its accuracy. Somani’s intent was to determine the capabilities of large language models (LLMs) in solving open mathematical problems and where they might struggle. To his amazement, the new model seemed to push the boundaries further than anticipated. ChatGPT demonstrated a remarkable ability to articulate mathematical concepts, referencing axioms such as Legendre’s formula, Bertrand’s postulate, and even the Star of David theorem. In an intriguing twist, the model unearthed a 2013 Math Overflow post by Harvard mathematician Noam Elkies, which offered an elegant solution to a similar challenge. However, ChatGPT's proof diverged significantly from Elkies' work, presenting a more comprehensive resolution to a question posed by the acclaimed mathematician Paul Erdős, known for his extensive collection of unsolved problems that have become a testing ground for AI. For those skeptical of machine intelligence, these findings are quite remarkable. AI applications in mathematics are proliferating, with tools ranging from formalization-focused LLMs like Harmonic’s Aristotle to comprehensive literature review systems like OpenAI’s deep research. Since the introduction of GPT 5.2, which Somani describes as “anecdotally superior in mathematical reasoning,” the number of problems solved has surged, prompting new discussions about the potential of LLMs to expand the horizons of human knowledge. Somani focused on Erdős' problems, which encompass over a thousand conjectures available online, presenting a compelling challenge for AI-driven mathematics. The first significant autonomous solutions emerged last November from a Gemini-powered model known as AlphaEvolve. However, it is GPT 5.2 that has shown exceptional proficiency with high-level math in recent evaluations. Since Christmas, 15 of these problems have transitioned from “open” to “solved,” with AI models credited for 11 of these solutions. Prominent mathematician Terence Tao has provided a detailed analysis of this progress on his GitHub page, noting eight specific instances where AI models made substantial autonomous advancements on Erdős problems, along with six cases that involved leveraging previous research. While AI is not yet capable of executing math independently, it is clear that large models have a significant role to play. Tao proposed on Mastodon that the scalable nature of AI makes it ideally suited for addressing the more obscure Erdős problems, many of which could have straightforward solutions. He emphasized that these simpler problems are now more likely to be solved using purely AI-driven approaches rather than human involvement or hybrid methods. Another crucial factor is the growing emphasis on formalization, a meticulous process that simplifies mathematical reasoning and enhances verification. Although formalization itself does not necessitate AI or computers, recent advancements in automated tools have simplified this endeavor. The open-source proof assistant Lean, developed by Microsoft Research in 2013, has gained traction in the field for formalizing proofs, while AI tools like Harmonic’s Aristotle aim to streamline much of the formalization workload. According to Tudor Achim, founder of Harmonic, the uptick in solved Erdős problems is less notable than the increasing acceptance of these tools by the mathematics community. "What matters more is that respected math and computer science professors are utilizing AI tools,” Achim commented. “These individuals have their reputations at stake; their endorsement of tools like Aristotle or ChatGPT serves as significant validation.”

Sources : TechCrunch

Published On : Jan 14, 2026, 19:30

Science
Artemis II Mission Thrives as Astronauts Connect with Earth from Space

As the Artemis II mission entered its third day, the spacecraft's powerful engine had propelled the astronauts into a fa...

Ars Technica | Apr 03, 2026, 22:25
Artemis II Mission Thrives as Astronauts Connect with Earth from Space
AI
Anthropic Launches New PAC to Shape AI Policy Landscape

Anthropic has officially announced the establishment of a new political action committee (PAC), signaling its commitment...

TechCrunch | Apr 03, 2026, 21:00
Anthropic Launches New PAC to Shape AI Policy Landscape
Computing
Community Concerns Rise Over Data Centers: Poll Reveals Preference for Warehouses

The expansion of data centers has sparked significant public debate, with a recent poll from Harvard and MIT highlightin...

TechCrunch | Apr 03, 2026, 19:40
Community Concerns Rise Over Data Centers: Poll Reveals Preference for Warehouses
Startups
OpenAI Restructures Leadership: Key Changes Amid Health Challenges

OpenAI is undergoing significant executive changes, as confirmed by a spokesperson in a report by Bloomberg. Notably, Br...

TechCrunch | Apr 03, 2026, 21:00
OpenAI Restructures Leadership: Key Changes Amid Health Challenges
Automotive
Tesla's Austin Factory Sees Workforce Cut Amid Declining Sales

Tesla's manufacturing hub near Austin, Texas, has experienced a significant reduction in its workforce, with numbers plu...

TechCrunch | Apr 03, 2026, 21:00
Tesla's Austin Factory Sees Workforce Cut Amid Declining Sales
View All News