Fei-Fei Li explains world models’ roles in robotics and gaming
The AI pioneer's new taxonomy breaks world models into three core functions, offering a framework that could reshape how robots learn and games get built.
Fei-Fei Li wants to settle a debate that’s been simmering in the AI community for a while now: what actually counts as a “world model” and what’s just a fancy video generator wearing a lab coat.
The Stanford professor and World Labs CEO published “A Functional Taxonomy of World Models” on June 3, 2026, laying out a framework that categorizes world models into three distinct functions: renderer, simulator, and planner. The paper argues these three roles form an interconnected loop that underpins what Li calls “spatial intelligence,” the kind of AI that can actually understand and interact with physical environments.
Three jobs, one model
The renderer function handles visual generation. It creates high-fidelity visual representations from data inputs. This is what most current “world models” actually do, and Li makes the pointed argument that systems stuck at this level are not true world models at all.
The simulator function goes deeper. It doesn’t just show you what something looks like. It models physics, cause and effect, and the way objects interact over time. A renderer can show you a ball rolling toward a cliff edge. A simulator knows the ball will fall off.
The planner function uses the simulator’s understanding of how the world works to chart courses of action. It’s the difference between an AI that watches a kitchen and one that can figure out how to make you a sandwich without breaking every plate in the cabinet.
These three functions don’t operate in isolation. Li’s paper describes them as forming a continuous loop, where each capability feeds into and strengthens the others. A renderer informs the simulator about visual context, the simulator provides the planner with physics-grounded predictions, and the planner’s goals shape what the renderer and simulator need to prioritize.
Why robotics needs this badly
Li has argued, including in an earlier manifesto from November 2025, that world models can bridge the gap between simulation and reality. If you can build a sufficiently accurate digital replica of the physical world, robots can train there first.
World Labs has already started putting this theory into practice. The company launched Marble, its first commercial product, in November 2025. Marble generates persistent, high-fidelity 3D worlds from multimodal prompts, meaning you can describe an environment using text, images, or other inputs, and Marble builds a navigable 3D space from that description. The system is already being used in robotic simulation environments.
Unlike a video, which is a fixed sequence of frames, Marble’s worlds maintain consistent geometry and physics as you move through them. A robot training in a Marble environment can approach the same shelf from different angles and find the same objects in the same positions.
The money behind the mission
World Labs raised $1 billion in February 2026, building on a previous $230 million round. The investor roster includes AMD, Autodesk, NVIDIA, and Fidelity.
The $1.23 billion in total funding puts World Labs in rare company for an AI startup focused on spatial intelligence rather than the large language model arms race that has dominated headlines.
Earn with Nexo