Google releases VaultGemma, its first privacy-preserving LLM

Google releases VaultGemma, its first privacy-preserving LLM

In the quest to develop larger AI models, tech companies have faced significant challenges due to a scarcity of quality training data. As these firms comb through the internet for additional data sources, the risk of inadvertently utilizing sensitive user information has escalated. In response, a dedicated team at Google Research is pioneering innovative methods aimed at ensuring that large language models (LLMs) are less prone to 'memorizing' sensitive content. LLMs are known for their unpredictable outputs, which means the results they produce can vary even with identical inputs. However, there are instances where these models may inadvertently repeat information from their training datasets. If sensitive or personal data is included in this training process, it can lead to serious privacy violations. Furthermore, if copyrighted materials are incorporated into the training data—whether intentionally or unintentionally—it can create legal complications for developers. To tackle these issues, Google is implementing differential privacy, a technique that introduces controlled noise during the training stage to curb the risk of memorization. While this approach enhances user privacy, it does come with challenges, particularly in terms of model accuracy and computational demands. Until now, there hasn’t been a clear understanding of how these factors influence the scaling laws of AI models. The research team operated under the premise that model performance hinges on the noise-batch ratio, which measures the amount of randomized noise in relation to the original training data size. Through a series of experiments that varied model sizes and noise-batch ratios, they gained vital insights into the scaling laws associated with differential privacy. This research highlights the delicate balance between computational resources, privacy considerations, and the amount of training data available. Essentially, an increase in noise can degrade output quality unless compensated by a corresponding rise in computational power or data volume. The findings of this study offer developers a framework for optimizing the noise-batch ratio to enhance privacy in their models.

Sources : Ars Technica

Published On : Sep 15, 2025, 21:10

Science
NASA's Artemis II Prepares for Launch as Fuel Test Proves Successful

NASA's recent fueling test of the Space Launch System (SLS) rocket concluded successfully, paving the way for a potentia...

Ars Technica | Feb 21, 2026, 24:00
NASA's Artemis II Prepares for Launch as Fuel Test Proves Successful
Gaming
Meta Reimagines Horizon Worlds, Shifts Focus to Mobile Experience

In a significant pivot, Meta has announced that its Horizon Worlds social and gaming platform will no longer be tethered...

Ars Technica | Feb 20, 2026, 21:50
Meta Reimagines Horizon Worlds, Shifts Focus to Mobile Experience
Cybersecurity
National PTA Cuts Ties with Meta Amid Rising Child Safety Concerns

The National Parent Teacher Association (PTA) has announced its decision to sever ties with Meta as the tech giant faces...

CNBC | Feb 20, 2026, 21:30
National PTA Cuts Ties with Meta Amid Rising Child Safety Concerns
Entertainment
Matthew McConaughey Urges Aspiring Actors to Safeguard Their Likeness Amid AI Revolution

In a bold statement about the future of acting, Matthew McConaughey warned emerging talents that they must adapt to the ...

Business Insider | Feb 21, 2026, 10:50
Matthew McConaughey Urges Aspiring Actors to Safeguard Their Likeness Amid AI Revolution
AI
India's AI Summit: A Mixed Bag of Innovation and Disarray

This week, India hosted a significant AI event that aimed to position the nation as a key player in the artificial intel...

CNBC | Feb 21, 2026, 09:05
India's AI Summit: A Mixed Bag of Innovation and Disarray
View All News