Job description

University of Copenhagen, Department of Computer Science • Copenhagen, Denmark

Copenhagen, Denmark

In this fellowship, you'll research mechanistic interpretability methods to improve LLM security and mitigate false information attacks.
Develop novel mechanistic interpretability methods and evaluation protocols across different model lifecycle stages.
Collaborate with supervisors, postdoctoral researchers, and external partners to advance mechanistic interpretability.
Author and publish research papers in high-impact venues and disseminate findings nationally and internationally.
Undertake PhD courses in academic writing and specialised ML/NLP topics, and conduct research abroad.

Applications are handled by the employer.

AI Safety Careers does not process applications directly.

PhD Fellowship, Mechanistic Interpretability for Large Language Model Security