Skip to content

Kelford Press

Signal from the noise

AI / ML· news shot

JetBrains open-weights Mellum2, a 12B MoE code model with 2.5B active parameters

JetBrains released Mellum2, a 12B-parameter Mixture-of-Experts (MoE) model trained on natural language and code, under the Apache 2.0 license. The model routes each token through 8 of 64 experts, activating 2.5B parameters per pass, and supports a 131,072-token context via layer-selective YaRN extension. Successor to JetBrains' 4B code-completion model, Mellum2 broadens scope to routing, RAG, sub-agents, and private deployment, shipping in six checkpoints (Base, Instruct, Thinking, and SFT/pretrain variants). JetBrains claims more than 2x faster inference than similarly sized open models; on the Base checkpoint it trails Qwen2.5-7B on HumanEval (41.5 vs 55.5) while tracking it on GSM8K and MMLU. Whether a sparse "focal" model wins adoption over denser coding LLMs remains open.