'Inference whales' are breaching AI coding startup business models

'Inference whales' are breaching AI coding startup business models

The landscape of AI coding services is facing a significant challenge as the costs of inference continue to rise. Heavy users, referred to as "inference whales," are driving up expenses, forcing startups to rethink their business strategies and pricing models to avoid substantial losses. Inference, which pertains to the execution of AI models, becomes increasingly expensive with newer reasoning models that break user requests into multiple steps. This issue is particularly pronounced in AI coding services where developers utilize automated agents for long-term tasks, leading to soaring operational costs. Many of these services operate on subscription models that promise unlimited use for a fixed monthly price. However, some users exploit this system by submitting massive projects, placing immense financial pressure on startups. These companies must still cover the costs of the underlying AI models, creating a precarious balance between a steady revenue stream and escalating backend expenses. Eric Simons, CEO of StackBlitz, expressed concerns about the fragility of businesses that primarily resell AI inference, stating, "If you're purely reselling AI inference, your business could be very fragile and vulnerable." Anthropic, a notable player in this field, had previously offered its Claude Code service through an enticing $200 monthly unlimited plan. However, some users took advantage of this pricing structure, resulting in staggering usage costs that far exceeded their subscription fees. Reports indicate one developer on the Claude Code Leaderboard utilized nearly 11 billion tokens, racking up costs of approximately $35,000 while paying only $200. To address this unsustainable model, Anthropic announced plans to modify its pricing structure, which will include weekly rate limits starting August 28. Users who exceed these limits will need to purchase additional capacity. An Anthropic representative noted that this change aims to ensure consistent performance for all developers while managing extreme usage by a small number of customers. One developer, Albert Örwall from Sweden, shared his experience using the Claude Code subscription for his projects. He indicated that his regular workflow could generate inference costs of around $500 per day, highlighting the potential for unsustainable expenses under the current pricing model. Örwall plans to adapt his coding practices to align with the new limits, indicating a shift in how developers will need to approach their projects moving forward. Cursor, another popular AI coding service, also changed its pricing model from unlimited requests to a tiered system, which led to confusion among users. This reflects a broader trend in the industry where companies are grappling with the reality of rising inference costs despite expectations for price reductions. As the demand for advanced AI models continues, the assumption that costs will decrease has not materialized. Instead, the integration of new models often comes with increased prices, complicating the financial viability of these services. Ethan Ding, CEO of TextQL, noted that the shift toward longer, automated AI workflows means that even if per-token prices drop, the overall costs can remain prohibitively high. In this evolving landscape, it's clear that offering unlimited usage under any subscription model is becoming increasingly untenable. The math simply does not add up, leaving startups to navigate a complex and challenging environment as they adapt to the demands of their users.

Sources : Business Insider

Published On : Aug 12, 2025, 17:00

Startups
From Weekend Project to Docker Partnership: The Rise of NanoClaw

Gavriel Cohen, the mastermind behind NanoClaw, has experienced an extraordinary six-week journey that began with a simpl...

TechCrunch | Mar 13, 2026, 17:45
From Weekend Project to Docker Partnership: The Rise of NanoClaw
Computing
Adobe Agrees to $75 Million Settlement Over Subscription Cancellation Practices

In a recent legal development, Adobe has reached a settlement with the Department of Justice regarding allegations of mi...

Ars Technica | Mar 13, 2026, 18:55
Adobe Agrees to $75 Million Settlement Over Subscription Cancellation Practices
Social Media
Meta Enhances Protections Against Impersonation for Creators on Facebook

In response to ongoing criticisms that Facebook has become cluttered with low-quality AI-generated content, Meta unveile...

TechCrunch | Mar 13, 2026, 20:55
Meta Enhances Protections Against Impersonation for Creators on Facebook
Streaming
Amazon Ups the Ante on Prime Video: New Pricing and Features Unveiled

Beginning April 10, Amazon Prime members will see an increase in the cost of ad-free Prime Video, escalating from $3 to ...

Ars Technica | Mar 13, 2026, 17:20
Amazon Ups the Ante on Prime Video: New Pricing and Features Unveiled
AI
The Disruptive Future of AI: Palantir's Alex Karp Sounds the Alarm

Alex Karp, CEO of Palantir, has voiced significant concerns about the impact of artificial intelligence on society, warn...

Business Insider | Mar 13, 2026, 16:45
The Disruptive Future of AI: Palantir's Alex Karp Sounds the Alarm
View All News