
The escalating demand for computing power to support AI models continues to surge, presenting a unique set of challenges for companies in the field. Chief among these hurdles are the procurement of suitable chips and their integration into data centers to start generating revenue. General Compute, an emerging player in the neocloud sector that specializes in renting out AI processing power during the inference phase, is poised to address these issues, shedding light on the future of the AI landscape. Recently, General Compute successfully secured $15 million in seed funding, achieving a post-money valuation of $60 million. This investment round was led by FUSE VC, with contributions from Carya Venture Partners and Village Global Ventures. A pivotal question arises: what constitutes the ideal chip for AI inference? Although the demand for GPUs has skyrocketed, industry consensus is emerging that they may not be the most efficient option for running AI models in their active response phase. The computational needs of AI during inference differ significantly from those during training, prompting the development of a new category of chips specifically tailored for these tasks. Notable acquisitions, such as Nvidia's $20 billion Groq deal and Cerebras' recent $57 billion IPO, highlight the industry's shifting dynamics. With capacity constraints at both companies, General Compute's co-founders, CEO Finn Puklowski and CTO Jason Goodison, have identified an alternative. They are turning to specialized chips produced by SambaNova, a chipmaker backed by Intel, which has not been in the spotlight recently. SambaNova's upcoming chip release promises a more flexible architecture and greater memory usage for context retention during inference. The company asserts that its chips surpass not only GPUs but also competitors from Groq and Cerebras. Puklowski claims that these new chips will generate between 600 to 700 tokens per second, significantly outpacing the approximate 250 tokens per second delivered by GPUs. General Compute has already placed a substantial order for $300 million worth of SambaNova's SN50 chips and aims to be the first neocloud provider to deploy them. In addition to chip innovation, General Compute is addressing another major challenge: data center installation. The air-cooled design of these chips reduces power consumption, facilitating their installation in existing data centers without necessitating infrastructure upgrades. Puklowski is exploring colocation agreements, which involve placing General Compute's hardware in third-party facilities. This strategy targets not only data center operators but also crypto miners seeking to adapt their infrastructure due to fluctuating bitcoin production costs. Last week, General Compute launched its cloud services, claiming to be the fastest provider for running MiniMax 2.7, a prominent open-source large language model (LLM). Joe Hassleman, a venture investor who recognized the potential of the inference sector early on with his investment in Groq in 2021, has since launched Evercrest Partners, focusing on AI investments. General Compute marks his initial investment in this new venture. Hassleman sees similarities between General Compute's partnership with SambaNova and other successful collaborations, such as Coreweave with Nvidia. He emphasizes the necessity for a diverse customer base that can leverage chips in high-growth environments. "As much as General Compute is making a bet on SambaNova, SambaNova is making a bet on General Compute," he remarked. The crux of the matter lies in determining which computer architecture will yield the most value as AI technology evolves. Inference clouds represent a strategic bet on a future characterized by multiple models and agents, where speed and cost efficiency in inference take center stage. The recent $113 million Series B funding for OpenRouter underscores the demand for solutions that optimize token expenditure by providing access to various models. Puklowski aims to drastically reduce the time required for coding tasks from an hour to just five or ten minutes, and to enhance the efficiency of audio agents used in customer service. "If you use ChatGPT and it gives you 50 tokens per second, that’s still a heck of a lot faster than we can read," Puklowski noted, highlighting the urgency for faster inference as AI continues to transition towards agent-to-agent interactions, where swift responses are crucial.
On Wednesday, TikTok announced the launch of its new standalone application, TikTok Pro Events, aimed at celebrating sig...
TechCrunch | Jun 03, 2026, 14:15
On Wednesday, Xcimer Energy, a fusion startup, activated its impressive Phoenix laser system, which it claims is the lar...
TechCrunch | Jun 03, 2026, 10:25
At the recent Conversations event in London, Meta introduced a cutting-edge artificial intelligence tool known as the Me...
Business Today | Jun 03, 2026, 14:25
In a strategic move to enhance its artificial intelligence capabilities, Meta has appointed Alexandr Wang, a 28-year-old...
Ars Technica | Jun 03, 2026, 13:45
The software engineering sector is undergoing a dramatic transformation, as recent advancements in artificial intelligen...
Business Insider | Jun 03, 2026, 10:15