Nvidia Launches Cosmos 3 at Computex

nvidia
Monday, 01 June 2026 at 08:45
image_1780296543579
Jensen Huang unveiled Cosmos 3 today at GTC Taipei during Computex 2026. It's Nvidia's most ambitious open-source AI release yet — a physical AI foundation model that unifies vision reasoning, world generation, and action prediction in a single system. Simultaneously, Nvidia launched the Cosmos Coalition with Agile Robots, Black Forest Labs, Runway, and Skild AI to advance open world models together.
This is today's most significant AI announcement.

Key Points

  • Nvidia Cosmos 3 launched at GTC Taipei — the world's first fully open omnimodel for physical AI, built on a breakthrough mixture-of-transformers architecture
  • Cosmos 3 natively understands and generates text, images, video, ambient sound, and actions in a single model — enabling synthetic data generation and physical AI policy development simultaneously
  • Two model sizes available now on Hugging Face: Cosmos 3 Nano and Cosmos 3 Super — training scripts, deployment tools, and datasets all open-sourced on GitHub
  • Nvidia launched the Cosmos Coalition alongside — Agile Robots, Black Forest Labs, Generalist, LTX, Runway, and Skild AI are founding members
  • Cosmos 3 tops the leaderboard for text-to-image and image-to-video benchmarks — Samsung, LG Electronics, Li Auto, and Doosan Robotics are among confirmed adopters

What "Physical AI" Actually Means

Most AI models understand language. Some understand images and video. Physical AI has to understand something harder — what happens next in the real world. A robot arm reaching for an object needs to predict how the object will move when touched. An autonomous vehicle needs to model what a pedestrian stepping off a kerb is about to do. These aren't language tasks. They're physics tasks.
001ZzMwgly1idprnq5gk2j61400l944r02
Cosmos 3 was built to handle this. The mixture-of-transformers architecture processes multiple input streams simultaneously — camera feeds, sensor data, text instructions — and generates predictions about future world states alongside direct action recommendations. A robotic system trained on Cosmos 3 doesn't just see the world; it simulates what the world will look like a second from now, then decides what to do based on that simulation.

The Omnimodel Architecture

Previous physical AI systems handled different modalities in separate models — one for vision, one for language, one for action prediction. Integration was a patchwork. Cosmos 3 handles text, images, video, ambient sound, and actions in a single unified model. That matters because real-world physical AI constantly needs to cross those boundaries — understanding a spoken instruction, seeing the environment, predicting audio cues, and generating motor commands all within the same inference pass.
Cosmos 3 Nano is the smaller, more efficient version for edge deployment. Cosmos 3 Super is the full-scale model for data centre training and high-fidelity simulation.
image_1780296539188

Open Source — All of It

Nvidia is releasing the model weights, training scripts, deployment tools, and the datasets that trained Cosmos 3. That's an unusually complete open-source release for a frontier AI model. The explicit goal is reproducibility — other labs should be able to verify Nvidia's results and build on top of them rather than trusting benchmark numbers in a press release.
The Cosmos Coalition formalises that open collaboration. Agile Robots, Black Forest Labs, Runway, and others commit to contributing to and advancing the open world model ecosystem rather than building competing closed alternatives.
loading

Loading