§ The Long FormPolicy · Regulatory
What CAISI's pre-deployment evaluations actually cover.
All five major US frontier labs have signed pre-deployment evaluation agreements with CAISI. Here's what the evaluations test, what they don't, and where they leave model risk after launch.
15 May 2026 · 8 min read · Kelford Press editorial
The headline is clean: every major frontier model launched from a US lab must now clear a CAISI evaluation before public release. The interesting question is what the evaluation actually measures, and what risks it leaves on the table. We read the published framework and the lab-side commitments. Here's the practical shape.
§ 01
What CAISI evaluates
The framework focuses on three categories: catastrophic misuse (bio, chem, radiological, nuclear, cyber), loss of model control, and societal-scale risks. Evaluations are conducted before public release with the lab providing temporary deep access — internal weights, tools, and safety harnesses — under an NDA. The output is pass / pass-with-conditions / hold, and labs commit to remediating conditions before launch.
- Three risk classes: catastrophic misuse, loss of control, societal-scale impact.
- Pre-release access to weights, tool surfaces, and internal safety harnesses.
- Outputs are pass / pass-with-conditions / hold.
- Labs commit to remediating any conditions before public launch.
§ Subscriber-only · 2 sections ahead
The rest of this dive is for subscribers.
Kelford Press deep dives are sourced, fact-checked, and free to read in part. The full piece — including 2 remaining sections and the takeaways — unlocks for subscribers from $4.99/month.
No tracking pixels. One-click cancel. Newsletter stays free either way.
§ Further reading