Nvidia’s Cosmos 3 Super ranks in top tier of Text-to-Image Arena despite dominating other benchmarks
The 64-billion parameter model claims the top spot on Artificial Analysis but lands at #8 and #11 on arena.ai, highlighting just how messy AI benchmarking has become
Nvidia’s newest AI model is simultaneously the best and merely pretty good, depending on who’s grading the test.
Cosmos 3 Super, the flagship variant of Nvidia’s open omnimodal world foundation model launched on May 31 at GTC Taipei/Computex, grabbed the #1 open-weights position for text-to-image tasks on Artificial Analysis with an Elo score around 1229. On arena.ai’s Text-to-Image Arena, though, the same model landed at #8 and #11, with Elo scores hovering between 1052 and 1062.
What Cosmos 3 actually is
Nvidia built Cosmos 3 as an “omnimodal world foundation model” designed to understand and generate across multiple types of data at once, including images, video, audio, and physical actions, all processed through one unified system.
The architecture powering this is called Mixture-of-Transformers, and the Super variant packs 64 billion parameters. Its smaller sibling, Cosmos 3 Nano, runs 16 billion parameters. Both became available on Hugging Face shortly after the announcement.
Nvidia fed the model roughly 20 trillion tokens spanning images, audio, and action data, designed to make the model useful not just for generating images but for robotics, autonomous driving, and simulation environments.
Cosmos 3 Super leads multiple categories in physical AI benchmarks, including PAI-Bench, RoboArena, and VANTAGE-Bench. It’s also performing well in image-to-video tasks.
The benchmarking problem nobody has solved
Artificial Analysis and arena.ai use different evaluation methodologies, different comparison pools, and different scoring systems. Arena-style evaluations often rely on human preference voting, which introduces subjectivity. Automated benchmarks measure specific technical capabilities but can miss qualitative factors that users care about.
Why crypto and AI investors should pay attention
Nvidia’s decision to release Cosmos 3 as open weights is strategically significant. The rapid community engagement on Hugging Face following the launch suggests developers are already building on top of Cosmos 3.
Cosmos 3’s capabilities in robotics and autonomous systems are particularly relevant to decentralized compute networks, AI agent frameworks, and on-chain inference protocols. Every time a frontier model goes open-weight, it reduces the moat around proprietary AI services and strengthens the case for decentralized alternatives.