Meta has unveiled V-JEPA 2, the latest version of its AI “world model” designed to help robots and AI agents understand and interact with the physical world more effectively.
An evolution of the original V-JEPA model, V-JEPA 2 was trained on more than 1 million hours of video to build predictive, real-world understanding — such as anticipating how objects will move or what actions are likely next in a given scenario.
For example, Meta demonstrates how the model can infer that holding a plate and spatula near a stove likely means the next step is plating cooked eggs. These types of intuitive predictions mimic the physical reasoning of animals or children.
Meta claims V-JEPA 2 is 30 times faster than Nvidia’s Cosmos model, though the two may be evaluated on different benchmarks.
“We believe world models will usher a new era for robotics,” said Meta’s chief AI scientist Yann LeCun, noting their potential to reduce the need for extensive real-world robotic training data.