§ The Long FormPolicy · Regulatory

What CAISI's pre-deployment evaluations actually cover.

All five major US frontier labs have signed pre-deployment evaluation agreements with CAISI. Here's what the evaluations test, what they don't, and where they leave model risk after launch.

15 May 2026 · 8 min read · Kelford Press editorial

The headline is clean: every major frontier model launched from a US lab must now clear a CAISI evaluation before public release. The interesting question is what the evaluation actually measures, and what risks it leaves on the table. We read the published framework and the lab-side commitments. Here's the practical shape.

§ 01

What CAISI evaluates

The framework focuses on three categories: catastrophic misuse (bio, chem, radiological, nuclear, cyber), loss of model control, and societal-scale risks. Evaluations are conducted before public release with the lab providing temporary deep access — internal weights, tools, and safety harnesses — under an NDA. The output is pass / pass-with-conditions / hold, and labs commit to remediating conditions before launch.

Three risk classes: catastrophic misuse, loss of control, societal-scale impact.
Pre-release access to weights, tool surfaces, and internal safety harnesses.
Outputs are pass / pass-with-conditions / hold.
Labs commit to remediating any conditions before public launch.

§ Subscriber-only · 2 sections ahead

The rest of this dive is for subscribers.

Kelford Press deep dives are sourced, fact-checked, and free to read in part. The full piece — including 2 remaining sections and the takeaways — unlocks for subscribers from $4.99/month.

See subscription plans →Already a subscriber? Sign in →

No tracking pixels. One-click cancel. Newsletter stays free either way.

§ Further reading