Skip to content

Kelford Press

Signal from the noise

§ The Long FormPolicy · Regulatory

What CAISI's pre-deployment evaluations actually cover.

All five major US frontier labs have signed pre-deployment evaluation agreements with CAISI. Here's what the evaluations test, what they don't, and where they leave model risk after launch.

15 May 2026 · 8 min read · Kelford Press editorial

The headline is clean: every major frontier model launched from a US lab must now clear a CAISI evaluation before public release. The interesting question is what the evaluation actually measures, and what risks it leaves on the table. We read the published framework and the lab-side commitments. Here's the practical shape.

§ 01

What CAISI evaluates

The framework focuses on three categories: catastrophic misuse (bio, chem, radiological, nuclear, cyber), loss of model control, and societal-scale risks. Evaluations are conducted before public release with the lab providing temporary deep access — internal weights, tools, and safety harnesses — under an NDA. The output is pass / pass-with-conditions / hold, and labs commit to remediating conditions before launch.

  • Three risk classes: catastrophic misuse, loss of control, societal-scale impact.
  • Pre-release access to weights, tool surfaces, and internal safety harnesses.
  • Outputs are pass / pass-with-conditions / hold.
  • Labs commit to remediating any conditions before public launch.

§ Subscriber-only · 2 sections ahead

The rest of this dive is for subscribers.

Kelford Press deep dives are sourced, fact-checked, and free to read in part. The full piece — including 2 remaining sections and the takeaways — unlocks for subscribers from $4.99/month.

No tracking pixels. One-click cancel. Newsletter stays free either way.

§ Further reading

  1. 01Best AI models April 2026: ranked by benchmarks
  2. 02Will agentic AI optimize brands into oblivion?