An AI Model That Learns From Video And Exhibits ‘Surprise’ At Contradictory Information

Researchers at Meta have developed an artificial intelligence system, Video Joint Embedding Predictive Architecture (V-JEPA), that learns about the world through videos and exhibits a notion of “surprise” when presented with information contradicting its acquired knowledge. This AI model, unveiled in 2024, is engineered to make sense of the physical world without prior assumptions about the physics contained in the videos it analyzes.

By doing so, V-JEPA marks a significant departure from traditional AI systems that rely heavily on pre-defined rules and assumptions. The innovation behind V-JEPA lies in its ability to navigate the complexities of video data. Conventional AI models, which operate in “pixel space,” treat every pixel in a video as equally important.

This approach can lead to an overload of irrelevant information, causing the model to focus on inconsequential details, such as the motion of leaves, while overlooking crucial elements like the color of traffic lights or the positions of nearby cars. In contrast, V-JEPA’s architecture is designed to filter out unnecessary information and concentrate on the essential aspects of the visual data.

According to Micha Heilbron, a cognitive scientist at the University of Amsterdam, the claims made by the researchers are “very plausible, and the results are super interesting.

Image

Researchers have developed an AI system that learns about the world via videos and demonstrates a notion of “surprise” when presented with …

Alternative viewpoints and findings: See here