Yann LeCun’s paper reveals conditions for LeJEPA to learn world models
New theoretical work from Meta's chief AI scientist establishes when self-supervised models can actually recover the hidden causes behind what they observe.
Yann LeCun has spent years arguing that the future of AI isn’t about bigger chatbots or better image generators. It’s about building systems that understand the world the way humans do, by learning predictive models of how things work under the hood. His latest paper puts mathematical rigor behind that vision, and the answer involves a surprisingly specific set of conditions.
The paper, titled “When Does LeJEPA Learn a World Model?” and co-authored with Randall Balestriero and David Klindt, was submitted to arXiv on May 25, 2026. Its core finding: the LeJEPA architecture can reliably recover the true hidden causes behind observations, but only when those latent variables follow a Gaussian distribution and evolve through stationary, additive-noise dynamics.
What LeJEPA actually does, and why it matters
LeJEPA belongs to a family of architectures called Joint-Embedding Predictive Architectures, or JEPAs. Instead of trying to reconstruct raw pixel data, JEPAs learn to predict abstract representations of future states.
The original LeJEPA framework was introduced in 2025 and brought with it a technique called Sketched Isotropic Gaussian Regularization, or SIGReg. That’s a mouthful, but in English: it forces the model’s internal representations to be well-behaved Gaussian distributions, which eliminates a lot of the hand-tuned tricks that earlier self-supervised learning methods relied on.
The new paper takes this a step further by asking a fundamental question. Under what exact mathematical conditions can LeJEPA achieve what the authors call “linear identifiability” of latent variables? In other words, when can it actually find the real hidden causes behind nonlinear observations, rather than just learning some arbitrary representation that happens to work?
The Gaussian sweet spot
The paper proves that when the hidden variables driving observations are Gaussian (specifically, isotropic Gaussian) and their dynamics are stationary with additive noise, LeJEPA can recover those variables up to a linear transformation.
Traditional approaches to this problem, rooted in independent component analysis, typically assume that latent variables are non-Gaussian. LeCun’s paper essentially flips the script, showing that Gaussian latent variables aren’t just sufficient for recovery — they’re uniquely suited for the kind of linear identifiability that LeJEPA achieves.
The method combines two ingredients: alignment (making sure predictions match reality) and Gaussian regularization (keeping representations structured). The theoretical proof that this combination is both necessary and sufficient under the stated conditions is the paper’s real contribution.
The authors cite robot control tasks, specifically the Reacher task, where a robotic arm must reach a target position, as a direct application. The key detail: LeJEPA can learn to do this directly from raw pixel data, without needing pre-processed state information handed to it by engineers.
The constraint is real, though. The paper’s guarantees only hold when latent variables are Gaussian and dynamics are stationary with additive noise. Real-world environments are messy, non-stationary, and full of non-Gaussian surprises. The gap between “provably works under these conditions” and “works reliably in a factory” is where the next several years of research will be spent.
Earn with Nexo