This PhD studentship focuses on researching large language model safety through mechanistic interpretability and behavioral analysis, including monitoring model behavior and developing risk reduction strategies.
Added 9 days agoCambridge, UKAI Safety & Alignment