Job description
Cambridge University, Department of Engineering • Cambridge, UK
Cambridge, UK
-
In this studentship, you'll pursue a PhD exploring large language model safety through mechanistic interpretability and behavioural research.
-
Investigate Chain-of-Thought faithfulness and detect deceptive behaviour via perturbation methods and mechanistic analysis.
-
Monitor LLM behaviour at inference time and develop risk reduction strategies.
-
Apply either perturbation techniques to test CoT meaning or train models for transparency using human predictor evaluation.
-
Collaborate with your supervisor to define research direction after completing initial 1.5-year projects.