
On December 16, OpenAI announced the launch of FrontierScience, a groundbreaking benchmark aimed at assessing artificial intelligence systems' capabilities in expert-level scientific reasoning across the fields of physics, chemistry, and biology. This initiative comes as AI models increasingly demonstrate their potential to facilitate genuine scientific research. OpenAI emphasizes that reasoning is fundamental to scientific inquiry, extending beyond mere factual recall to encompass hypothesis generation, testing, refinement, and interdisciplinary synthesis. As AI technologies evolve, the pivotal question remains: how profoundly can these systems reason to effectively contribute to scientific innovations? Over the past year, OpenAI has celebrated significant achievements with its models, including exemplary performances at the International Math Olympiad and the International Olympiad in Informatics. Notably, advanced models like GPT-5 are already being leveraged by researchers to enhance scientific workflows, performing tasks such as cross-disciplinary literature searches and complex mathematical proofs in a fraction of the time. A recent paper published by OpenAI in November 2025 detailed the early findings from studies involving GPT-5, which suggest that the model can notably accelerate scientific processes. As AI's reasoning and knowledge capabilities expand, OpenAI points out that traditional scientific benchmarks no longer suffice. Many existing assessments rely on multiple-choice questions and have reached saturation, failing to address authentic scientific reasoning challenges. For instance, when the GPQA “Google-Proof” benchmark was introduced in November 2023, GPT-4 achieved a score of 39%, significantly below the expert baseline of 70%. Fast forward two years, and GPT-5.2 scored 92%, underscoring the urgent need for more rigorous evaluations. FrontierScience aims to bridge this gap by measuring expert-level scientific skills through challenging, original, and meaningful questions crafted and verified by domain experts. The benchmark comprises over 700 textual questions, with a gold-standard set of 160 questions covering various subfields in physics, chemistry, and biology. It is structured into two distinct tracks: - **FrontierScience-Olympiad:** - Consists of 100 short-answer questions designed by international science Olympiad medalists. - Focuses on constrained, theoretical scientific reasoning with difficulty comparable to international competitions. - **FrontierScience-Research:** - Comprises 60 original research subtasks authored by PhD-level scientists. - Reflects real-world, multi-step research challenges and is graded using a detailed 10-point rubric. Each question has been meticulously crafted and validated by subject-matter experts. The Olympiad questions are graded based on concise answers, including numerical values and expressions, allowing for clear verification. For the Research tasks, OpenAI has implemented a rubric-based grading system that evaluates both the final answers and the intermediate reasoning processes. In initial evaluations of several advanced AI models using FrontierScience, including GPT-5.2, Gemini 3 Pro, and others, results indicate notable progress in expert-level reasoning capabilities but also reveal substantial room for improvement. While FrontierScience represents a significant advancement in the evaluation of scientific reasoning, OpenAI acknowledges its limitations. The benchmark primarily focuses on constrained, expert-written problems and does not fully encapsulate the nuanced processes of real scientific research, such as generating novel hypotheses or interacting with multimodal data. Looking forward, OpenAI envisions that advancements in scientific reasoning will stem from both enhanced general-purpose reasoning systems and targeted improvements in scientific methodologies. FrontierScience is one of many tools in this journey, with plans for expansion into new domains and integration with real-world evaluations. Ultimately, OpenAI believes the most crucial measure of AI's contribution to science will be the new discoveries it fosters, and FrontierScience is positioned as an early indicator of this potential.
Since Donald Trump’s presidency began, the founder of FTX, Sam Bankman-Fried, has been on a mission to rebrand himself a...
Ars Technica | Mar 12, 2026, 19:00
Lucid Motors has introduced an innovative robotaxi concept named the "Lucid Lunar" during its recent investor day in New...
TechCrunch | Mar 12, 2026, 17:45
In a bold move reflecting the growing influence of artificial intelligence, Atlassian, the Australian productivity softw...
TechCrunch | Mar 12, 2026, 17:45
Substack is making significant strides in the realm of video content with the introduction of its new Substack Recording...
TechCrunch | Mar 12, 2026, 18:45
Grammarly has recently unveiled a contentious new feature that employs artificial intelligence to replicate editorial fe...
TechCrunch | Mar 12, 2026, 17:00