
A recent outage affecting Amazon Web Services (AWS) has been attributed to a single point of failure that triggered a domino effect across its vast network. According to a detailed analysis by Amazon's engineers, this incident lasted for over 15 hours and 32 minutes, impacting millions of users globally. Network monitoring firm Ookla reported that its DownDetector service recorded more than 17 million interruptions from approximately 3,500 organizations. The outage primarily affected users in the United States, the United Kingdom, and Germany, with major platforms like Snapchat, AWS itself, and Roblox among those most frequently reported as down. This incident is noted as one of the most significant internet outages in history, according to Ookla. The engineers identified the root cause as a software bug within the DynamoDB DNS management system, which is crucial for overseeing load balancer stability. The system functions by periodically updating DNS configurations for various endpoints within AWS. A race condition, an error linked to the timing of events beyond developers' control, led to unexpected and detrimental failures in this case. This particular race condition occurred in the DNS Enactor, a component of DynamoDB responsible for continuously updating domain lookup tables. It encountered significant delays while trying to refresh DNS updates across several endpoints. Meanwhile, another component, the DNS Planner, continued to create new configurations, leading to a timing conflict with the DNS Enactor. This misalignment ultimately resulted in the failure of the entire DynamoDB system. Amazon engineers have provided insights into this complex failure, highlighting the challenges faced in large-scale network management.
As the Artemis II mission entered its third day, the spacecraft's powerful engine had propelled the astronauts into a fa...
Ars Technica | Apr 03, 2026, 22:25
A groundbreaking study reveals that Native Americans have been engaging in games of chance using dice for over 12,000 ye...
Ars Technica | Apr 03, 2026, 23:00
Security experts have been raising alarms over the risks associated with OpenClaw, a popular AI tool that has rapidly ga...
Ars Technica | Apr 03, 2026, 20:30
Donald Trump is encountering major setbacks in his quest to rapidly expand AI data centers across the United States, a k...
Ars Technica | Apr 03, 2026, 20:50
Fidji Simo, the CEO of applications at OpenAI, has announced that she will be taking an extended medical leave due to a ...
CNBC | Apr 03, 2026, 20:40