The realm of physical AI is rapidly evolving, integrating robotics with advanced foundation models. Leading players such as Nvidia and Google have been at the forefront of this innovation, but the Allen Institute for AI (Ai2) is now entering the fray with its groundbreaking Molmo Act 7B. This new open-source model empowers robots to 'think' in three-dimensional space, enhancing their operational capabilities. Molmo Act builds on Ai2’s previous work with the Molmo model, introducing features that allow for sophisticated action reasoning within a physical environment. The model not only provides its training data openly but also operates under an Apache 2.0 license, while its datasets are available under CC BY-4.0. According to Ai2, Molmo Act excels in understanding spatial dynamics, enabling robots to navigate their surroundings more effectively. Unlike traditional vision-language-action (VLA) models, which often lack spatial reasoning, Molmo Act is designed to interpret and interact with the physical world. Ai2's representatives highlighted that this capability differentiates Molmo Act, making it more efficient and adaptable in various contexts. The model's potential applications are vast, with a particular focus on home environments, where irregularities and constant changes present significant challenges for robotics. To achieve its spatial comprehension, Molmo Act utilizes 'spatially grounded perception tokens.' These tokens are derived through a vector-quantized variational autoencoder, translating various inputs, including video, into quantifiable data. This innovative approach allows for a deeper understanding of geometric structures and distances between objects, which is crucial for effective navigation and interaction. Once equipped with this spatial awareness, Molmo Act can generate a sequence of waypoints, guiding robots in executing precise movements, such as adjusting an arm or performing intricate tasks. Ai2's assessments revealed an impressive task success rate of 72.1% for Molmo Act 7B, outperforming competitors from Google, Microsoft, and Nvidia. Experts in robotics are taking note of this advancement. Alan Fern, a professor at Oregon State University, remarked that Ai2’s work signifies a vital step toward refining VLMs for robotics and physical reasoning. Although he acknowledged that more progress is needed, he commended the shift towards 3D scene understanding as a significant leap forward. The accessibility of the model's data is also noteworthy. Daniel Maturana, co-founder of Gather AI, expressed enthusiasm for the implications this openness holds for research and development in the field, particularly given the high costs associated with building and training such models. The aspiration to create robots that are not only intelligent but also spatially aware has long been a goal for developers. Historically, programming every robotic action was tedious and limited flexibility. However, with the rise of LLM-based approaches, robots can now autonomously determine potential actions based on their interactions with the environment. In a landscape where physical AI is increasingly recognized as the next frontier, Ai2's Molmo Act offers a promising foundation for future innovations in robotic intelligence. As the technology matures, the possibility of achieving general physical intelligence—where robots can perform complex tasks without needing detailed programming—becomes more attainable, signaling an exciting era for robotics.
In a decisive move towards restructuring, Keith Rabois, the co-founder of Opendoor and its recently appointed board chai...
CNBC | Sep 12, 2025, 16:05At a recent technology conference hosted by Goldman Sachs, the prevailing sentiment was clear: the demand for artificial...
Business Insider | Sep 12, 2025, 16:05As automation increasingly takes over technical tasks, employers are now placing greater emphasis on the skills that mac...
Business Insider | Sep 12, 2025, 16:35The Federal Aviation Administration (FAA) has unveiled an innovative pilot program designed to allow electric vertical t...
TechCrunch | Sep 12, 2025, 17:15In an encouraging trend for the electric vehicle (EV) market, global sales have surged by 25% from January to August 202...
Ars Technica | Sep 12, 2025, 18:55