New arXiv paper: test-time training can undermine LLM safety guardrails
A paper posted to arXiv on 25 May 2026 (cs.LG, 2605.22984) argues that test-time training — the technique of adapting model parameters during inference, increasingly used for few-shot adaptation — can undermine the safety guardrails installed during post-training. The authors describe TTT as "an emerging paradigm" that improves task performance, but say its parameter mutations can weaken refusal behaviours that standard alignment evaluations were measured against. The full quantitative breakdown is in the paper; the implication for deployed systems is direct: any post-deployment adaptation may invalidate prior safety-evaluation certifications, and the same speedup the engineering community is pursuing for personalisation is the mechanism by which guardrails can be silently bypassed.