In the rapidly evolving landscape of artificial intelligence, a significant challenge has emerged: the availability of training data. Neema Raphael, the chief data officer at Goldman Sachs, recently expressed concerns about the current shortage of data during an episode of the bank's 'Exchanges' podcast. Raphael highlighted that this scarcity might be influencing the architecture of new AI systems. He mentioned China's DeepSeek as a case study, suggesting that its development costs could stem from utilizing the outputs of existing models instead of sourcing entirely new data. "The most intriguing aspect is how prior models will influence the next generation of AI technologies," he stated. With the traditional internet sources becoming limited, developers are increasingly looking towards synthetic data, which includes machine-generated text, images, and code. While this approach offers a seemingly unlimited supply, it also runs the risk of inundating AI models with low-quality information. Nevertheless, Raphael remains optimistic, asserting that the absence of fresh data won't severely hinder progress, particularly because many companies possess untapped data reserves. He remarked, "From a consumer standpoint, the surge in synthetic data is fascinating. However, in the enterprise realm, there's still significant potential to be unlocked." This suggests that the future of AI may hinge more on proprietary datasets held by corporations than on freely available internet information. Companies like Goldman Sachs, with vast amounts of data from trading activities and client interactions, could enhance AI tools significantly if leveraged correctly. Raphael's insights come amid industry discussions about having reached 'peak data' since the emergence of ChatGPT three years ago. At a recent conference, OpenAI co-founder Ilya Sutskever cautioned that all valuable online data has already been utilized in training models, hinting that the rapid advancement of AI might soon plateau. Moreover, Raphael emphasized that the challenge lies not only in sourcing more data but also in ensuring that it is applicable. "The key issues are understanding the data, its business context, and normalizing it for effective use within the business," he explained. He also raised thought-provoking questions regarding the reliance on synthetic data, pondering whether this could lead to a 'creative plateau' in AI. "If the data is predominantly machine-generated, how much human-derived information can still be integrated?" he questioned, indicating that this is a crucial aspect to monitor from both a technological and philosophical viewpoint.
Sam Altman, the CEO of OpenAI, recently engaged in a crucial dialogue with several lawmakers in Washington, D.C., where ...
CNBC | Mar 12, 2026, 20:25
Grammarly has recently unveiled a contentious new feature that employs artificial intelligence to replicate editorial fe...
TechCrunch | Mar 12, 2026, 17:00
Since Donald Trump’s presidency began, the founder of FTX, Sam Bankman-Fried, has been on a mission to rebrand himself a...
Ars Technica | Mar 12, 2026, 19:00
Facebook Marketplace is enhancing its platform with innovative Meta AI functionalities aimed at streamlining communicati...
TechCrunch | Mar 12, 2026, 18:45
Substack is making significant strides in the realm of video content with the introduction of its new Substack Recording...
TechCrunch | Mar 12, 2026, 18:45