
In a significant milestone, OpenAI introduced its latest AI model, GPT-5.3-Codex-Spark, on non-Nvidia hardware, specifically designed to run on Cerebras chips. This innovative model boasts an impressive capability of generating code at a staggering pace of over 1,000 tokens per second, marking an approximate 15-fold increase in speed compared to its predecessor. To put this in perspective, Anthropic’s Claude Opus 4.6 in its new premium fast mode achieves around 2.5 times its standard rate of 68.2 tokens per second, although it is a larger and more advanced model than Codex-Spark. "Cerebras has been an exceptional engineering partner, and we are thrilled to introduce rapid inference as a new capability of our platform," stated Sachin Katti, who leads compute operations at OpenAI. The Codex-Spark model is currently available as a research preview to ChatGPT Pro subscribers at $200 per month, accessible through the Codex app, command-line interface, and VS Code extension. OpenAI is also in the process of providing API access to select design partners. Equipped with a 128,000-token context window, the model is designed to process text exclusively at its launch. This latest release builds upon the comprehensive GPT-5.3-Codex model that OpenAI unveiled earlier this month. While the full model is capable of handling more complex coding tasks, Spark is specifically optimized for speed over extensive knowledge. On software engineering benchmarks like SWE-Bench Pro and Terminal-Bench 2.0, Codex-Spark reportedly outperforms the older GPT-5.1-Codex-mini, accomplishing tasks in significantly less time. However, OpenAI has not provided independent validation for these performance claims. Historically, Codex's speed has been a point of contention; a previous test revealed that Codex required nearly twice the time compared to Anthropic’s Claude Code in generating a working version of Minesweeper. In context, GPT-5.3-Codex-Spark's 1,000 tokens per second represents a remarkable advancement over any models OpenAI has previously offered. Independent benchmarks from Artificial Analysis indicate that OpenAI’s fastest models running on Nvidia hardware fall short of this new achievement, with GPT-4 delivering around 147 tokens per second and GPT-4o mini clocking in at approximately 52 tokens per second.
In a strategic move to bolster its presence in the healthcare sector, Anthropic has acquired the biotech startup Coeffic...
TechCrunch | Apr 03, 2026, 21:00
Recent findings reveal a troubling trend among users of large language models (LLMs): a significant portion appears will...
Ars Technica | Apr 03, 2026, 21:10
Glen Anderson, president of Rainmaker Securities, has been navigating the private market since 2010, witnessing its evol...
TechCrunch | Apr 04, 2026, 01:45
Fizz, a social app that allows users to post anonymously, has made its international debut in Saudi Arabia, marking a si...
TechCrunch | Apr 03, 2026, 22:50
Donald Trump is encountering major setbacks in his quest to rapidly expand AI data centers across the United States, a k...
Ars Technica | Apr 03, 2026, 20:50