AI Safety & Alignment
PhD Fellowship, Mechanistic Interpretability for Large Language Model Security
Copenhagen, Denmark
-
In this fellowship, you'll research mechanistic interpretability methods to improve LLM security and mitigate false information attacks.
-
Develop novel mechanistic interpretability methods and evaluation protocols across different model lifecycle stages.