This AI model understands how the physical world works

Trends News, Cyber Security, ICT, Most Popular

December 8, 2025

No Comments

By Daved Worner

WhatsApp Group Join Now

Telegram Group Join Now

original version of This is the story appeared Quanta Magazine.

Here’s a test for kids: Show them a glass of water on their desk. Hide it behind a wooden board. Now move the board to the glass. Are they surprised if the board goes past the glass, as if it is not there? By the age of 6 months, and by a year, almost all babies have an intuitive understanding of an object’s permanence, which is learned through observation. Now some artificial intelligence models do too.

Researchers have developed an AI system that learns about the world through video and displays the concept of “surprise” when it presents information that goes against its accumulated knowledge.

The model, developed by Meta and called the Video Joint Embedding Predictive Architecture (V-JEPA), makes no assumptions about the physics of the world contained in the videos. Still, it can begin to explain how the world works.

“Their claims, a priori, are very convincing, and the results are very interesting,” said Micha HeilbronA cognitive scientist at the University of Amsterdam who studies how the brain and artificial systems make sense of the world.

Higher abstraction

As engineers who build self-driving cars know, getting AI systems to reliably understand what they see can be difficult. Most systems designed to “understand” videos either classify their content (“a person playing tennis,” for example) or identify the outline of an object—say, a car in front—in what is known as “pixel space.” The model essentially treats every pixel in a video with equal importance.

But these pixel-space models come with limitations. Imagine trying to understand a suburban street. If the scene contains cars, traffic lights, and trees, the model may focus too much on irrelevant details such as leaf motion. It can miss the color of traffic lights or the location of nearby vehicles. “When you go to film or video, you don’t want to work [pixel] There are many details because you don’t want to model the space,” said Randall Ballestrierois a computer scientist at Brown University.