GEPA optimizes LLMs without costly reinforcement learning

GEPA optimizes LLMs without costly reinforcement learning

Researchers from the University of California, Berkeley, Stanford University, and Databricks have unveiled a groundbreaking AI optimization technique named GEPA, which demonstrates remarkable efficiency in adapting large language models (LLMs) for specialized tasks. Unlike conventional reinforcement learning (RL) approaches that depend heavily on extensive trial-and-error strategies, GEPA capitalizes on the inherent language understanding of LLMs, enabling them to self-evaluate, identify errors, and refine their instructions over time. This innovative method not only surpasses existing techniques in accuracy but also significantly reduces the number of trial runs needed—by as much as 35 times. For businesses engaged in developing complex AI solutions, this translates into swifter development cycles, reduced computational expenses, and more reliable applications. In today’s landscape, enterprise AI systems frequently involve intricate workflows that interlink multiple LLM modules, databases, code interpreters, and customized logic to tackle advanced tasks such as multi-step research and data analysis. Traditionally, optimizing these systems has relied on RL methods, such as Group Relative Policy Optimization (GRPO), which adopts a black-box approach. This entails executing tasks, receiving a basic success metric (a scalar reward), and incrementally adjusting the model’s parameters based on this feedback. However, the main drawback of RL is its inefficiency in sampling; it often demands tens of thousands of trial runs to extract meaningful insights, making it impractical for real-world applications that involve costly tool calls. Lakshya A. Agrawal, a co-author of the study and doctoral student at UC Berkeley, highlighted the complexity of RL as a significant barrier for many organizations. He explained that many teams have resorted to manual prompt engineering due to the high costs and complexities associated with RL. GEPA, designed for optimizing systems built on high-performance models that cannot be easily fine-tuned, allows teams to enhance performance without the need for managing custom GPU clusters. The central challenge posed by the researchers was how to maximize learning signals from each expensive rollout to facilitate effective adaptation of complex AI systems, especially in data-limited or budget-constrained environments. GEPA, which stands for Genetic-Pareto, addresses this by substituting sparse rewards with comprehensive, natural language feedback. This approach utilizes the entire execution of an AI system, including reasoning steps and tool calls, converting them into text that an LLM can analyze and comprehend. GEPA’s methodology rests on three fundamental pillars: genetic prompt evolution, reflection with natural language feedback, and Pareto-based selection. The first pillar treats a population of prompts like a gene pool, where prompts are systematically ‘mutated’ to generate potentially improved versions. The second pillar involves the LLM reflecting on comprehensive execution traces and outcomes after running a few rollouts, enabling it to diagnose issues and enhance prompts accordingly. The third pillar, Pareto-based selection, ensures a broad exploration of solutions by maintaining a diverse set of specialist prompts. This strategy prevents stagnation in suboptimal solutions and enhances the likelihood of discovering a prompt that performs well across various inputs. The researchers emphasize the importance of ‘feedback engineering,’ which involves revealing the rich textual details that systems often overlook. In evaluations across diverse tasks, including multi-hop question answering and privacy-preserving queries, GEPA consistently outperformed GRPO. For example, in optimizing a QA system, GEPA achieved results in approximately three hours compared to 24 hours with GRPO, while also delivering better performance and significantly lower costs. Moreover, the study revealed that systems optimized with GEPA exhibit greater reliability when processed with unseen data, indicated by a smaller generalization gap. This enhanced reliability is attributed to GEPA’s use of comprehensive, natural-language feedback, which fosters a deeper understanding of success rather than merely identifying patterns in training data. Another notable advantage of GEPA is that it generates shorter instruction-based prompts, which can lead to reduced latency and lower costs for API-driven models. The researchers also explored GEPA’s potential as an inference-time strategy, transforming AI from a mere answer generator to an iterative problem solver. Agrawal envisions a future where GEPA can be seamlessly integrated into CI/CD pipelines, refining multiple optimized versions of code and facilitating continuous optimization processes. The authors believe GEPA represents a significant evolution in AI development, making high-performing systems more accessible to end-users who possess relevant domain expertise but may lack the time or desire to navigate the complexities of RL.

Sources : VentureBeat

Published On : Aug 20, 2025, 03:41

Automotive
Motional's Autonomous Ioniq 5 Joins Uber's Robotaxi Fleet in Las Vegas

Uber has expanded its robotaxi services by incorporating autonomous vehicles from Motional, a company backed by Hyundai....

TechCrunch | Mar 13, 2026, 13:30
Motional's Autonomous Ioniq 5 Joins Uber's Robotaxi Fleet in Las Vegas
Computing
Nvidia's GTC Summit: Key Questions and Expectations Ahead

As Nvidia gears up for its annual GTC conference, anticipation is building around several critical issues that could sha...

Business Insider | Mar 13, 2026, 09:15
Nvidia's GTC Summit: Key Questions and Expectations Ahead
Startups
How Sweden's Long-Term Vision is Attracting Tech Talent from Silicon Valley

In the battle for tech talent, European companies often find themselves at a disadvantage, but one Swedish startup is ch...

Business Insider | Mar 13, 2026, 12:45
How Sweden's Long-Term Vision is Attracting Tech Talent from Silicon Valley
Aerospace
NASA's Artemis II Mission Set to Launch Amid Exciting Developments in Rocket Technology

In the latest edition of the Rocket Report, excitement builds as NASA prepares for the anticipated Artemis II mission, s...

Ars Technica | Mar 13, 2026, 13:00
NASA's Artemis II Mission Set to Launch Amid Exciting Developments in Rocket Technology
Startups
Shantanu Narayen: A Visionary Leader's Exit and Legacy at Adobe

After an illustrious 18-year tenure, Shantanu Narayen, the Chief Executive Officer of Adobe, is set to step down, leavin...

Business Today | Mar 13, 2026, 08:15
Shantanu Narayen: A Visionary Leader's Exit and Legacy at Adobe
View All News