Skip to content

Kelford Press

Signal from the noise

AI / ML· news shot

NVIDIA ships Nemotron Diffusion language models — 3B–14B params, 4x throughput on B200

NVIDIA released the Nemotron-Labs Diffusion family on 23 May 2026 — diffusion-based language models at 3B, 8B and 14B parameter scales, plus an 8B vision-language variant. The text models ship under the commercially-friendly NVIDIA Nemotron Open Model Licence with weights on Hugging Face. The 8B text model achieves 1.2% higher average accuracy than Qwen3 8B on the team's evaluation suite, while generating roughly 4x faster than the autoregressive baseline on NVIDIA B200 hardware — about 865 tokens/sec on the speedbench dataset in self-speculation mode. The model fills 32-token blocks via iterative denoising and supports three modes: pure autoregressive, FastDiffuser, and a self-speculative path that drafts bidirectionally and verifies causally.