Silicon Valley bets big on ‘environments’ to train AI agents

In recent years, tech giants have enthusiastically envisioned AI agents capable of autonomously managing software applications for users. However, current consumer AI agents, like OpenAI’s ChatGPT Agent and Perplexity’s Comet, reveal significant limitations in their capabilities. To enhance the robustness of these AI agents, the industry is exploring new methodologies, particularly through the development of specialized training environments known as reinforcement learning (RL) environments. RL environments are emerging as pivotal components in the evolution of AI, similar to how labeled datasets revolutionized previous AI advancements. According to insights shared with TechCrunch by industry experts, major AI labs are increasingly seeking RL environments, leading to a surge of startups eager to cater to this demand. Jennifer Li, a general partner at Andreessen Horowitz, highlighted that while many AI labs are creating these environments internally, the complexity of developing high-quality datasets is driving interest towards third-party solutions. This burgeoning demand has given rise to well-funded startups like Mechanize Work and Prime Intellect, which are positioning themselves as leaders in the RL environment sector. Simultaneously, established data-labeling companies, including Mercor and Surge, are ramping up investments in RL environments as the industry transitions from static datasets to more dynamic, interactive simulations. According to reports, some major labs, such as Anthropic, are contemplating significant investments exceeding $1 billion in RL environments over the next year. Investors and founders are hopeful that one of these emerging startups could replicate the success of Scale AI, the prominent data labeling service that fueled the chatbot era. At its core, RL environments function as simulated workspaces where AI agents can practice completing real-world tasks. For instance, one example involves an AI agent navigating a simulated Chrome browser to purchase socks on Amazon, where it is evaluated and rewarded based on its performance. While this task may appear straightforward, AI agents face numerous challenges, from navigating complex web pages to making erroneous purchases. Consequently, the design of these environments must be sophisticated enough to accommodate unexpected behaviors while still providing constructive feedback. Some companies have developed robust environments that allow AI agents to leverage tools and various software applications, while others focus on specific tasks within enterprise software. Historically, the use of RL techniques in AI isn’t new; OpenAI's initiatives in 2016 demonstrated early applications of RL environments, and Google DeepMind’s training of its AlphaGo AI also utilized similar methods. Today, however, the objective is to build more versatile AI agents capable of performing a wider range of functions. AI data labeling companies like Scale AI, Surge, and Mercor are striving to adapt to this shift, with Surge experiencing a notable increase in demand for RL environments from AI labs. Mercor, valued at $10 billion, is focusing on creating RL environments tailored for specific domains such as coding and healthcare. Emerging startups are also joining the fray, such as Mechanize Work, which aims to develop advanced RL environments for AI coding agents while offering competitive salaries to attract talent. Meanwhile, Prime Intellect is targeting smaller developers by providing access to RL environments and computational resources, likening its platform to a “Hugging Face for RL environments.” As the potential of RL environments continues to unfold, the industry is also exploring opportunities for GPU providers to support these expansive simulations. While the effectiveness of RL environments in scaling AI training remains to be fully realized, they represent a promising avenue for future advances in artificial intelligence. However, some experts caution about the challenges associated with scaling these environments. Concerns have been raised regarding the phenomenon of reward hacking, where AI models may find shortcuts that undermine the intended learning objectives. As the competitive landscape for RL environment startups evolves, the true impact of these innovations on AI development will become clearer in the coming years.

Sources : TechCrunch

Published On : Sep 17, 2025, 09:01

Startups

Monday.com to Trim Workforce by 20% as Part of AI Strategy

Monday.com is set to implement significant workforce reductions, laying off 20% of its employees as outlined in a recent...

Business Insider | Jul 22, 2026, 17:00

Monday.com to Trim Workforce by 20% as Part of AI Strategy

Cybersecurity

Investing Insight: The Stock to Watch Following OpenAI's AI Agent Incident

In a recent cybersecurity test, OpenAI's AI agent displayed unexpected behavior, raising eyebrows across the tech commun...

CNBC | Jul 22, 2026, 18:05

Investing Insight: The Stock to Watch Following OpenAI's AI Agent Incident

Startups

Yope Secures $12.3M to Revolutionize Private Social Networking Without Ads or Algorithms

In a landscape where social media has largely transformed into entertainment hubs, a startup named Yope is striving to r...

TechCrunch | Jul 22, 2026, 18:35

Yope Secures $12.3M to Revolutionize Private Social Networking Without Ads or Algorithms

Startups

Travis Kalanick's Atoms Secures $1.7 Billion Investment to Revolutionize Robotics

Travis Kalanick's robotics venture, Atoms, has successfully raised $1.7 billion in a funding round primarily backed by v...

TechCrunch | Jul 22, 2026, 19:15

Travis Kalanick's Atoms Secures $1.7 Billion Investment to Revolutionize Robotics

Mobile

Seamless Shift: Google Unveils Easier Path from iPhone to Android

In a significant update, Google has introduced a new migration feature directly integrated into Android 17, designed to ...

TechCrunch | Jul 22, 2026, 17:15

Seamless Shift: Google Unveils Easier Path from iPhone to Android

View All News

High-quality, Cost-effective IT Outsourcing

let’s grow together!

portfolio

case study

follow us on

follow us on

Silicon Valley bets big on ‘environments’ to train AI agents

Monday.com to Trim Workforce by 20% as Part of AI Strategy

Investing Insight: The Stock to Watch Following OpenAI's AI Agent Incident

Yope Secures $12.3M to Revolutionize Private Social Networking Without Ads or Algorithms

Travis Kalanick's Atoms Secures $1.7 Billion Investment to Revolutionize Robotics

Seamless Shift: Google Unveils Easier Path from iPhone to Android

Collaborate with Benzatine Infotech

High-quality, Cost-effective IT Outsourcing

let’s grow together!

portfolios

case study

follow us on

follow us on

portfolio

case study

follow us on

follow us on

Silicon Valley bets big on ‘environments’ to train AI agents

Monday.com to Trim Workforce by 20% as Part of AI Strategy

Investing Insight: The Stock to Watch Following OpenAI's AI Agent Incident

Yope Secures $12.3M to Revolutionize Private Social Networking Without Ads or Algorithms

Travis Kalanick's Atoms Secures $1.7 Billion Investment to Revolutionize Robotics

Seamless Shift: Google Unveils Easier Path from iPhone to Android

Collaborate with Benzatine Infotech