AI Safety & Alignment
PhD Studentship, Monitoring and Increasing LLM Safety
Cambridge, UK
-
In this studentship, you'll pursue a PhD exploring large language model safety through mechanistic interpretability and behavioural research.
-
Investigate Chain-of-Thought faithfulness and detect deceptive behaviour via perturbation methods and mechanistic analysis.