Job description

Model Evaluation and Threat Research • San Francisco Bay Area

San Francisco Bay Area

$250,000 - $450,000

In this role, you'll conduct research on AI capabilities, risks and mitigations through benchmarking and alignment assessment.
Develop and maintain benchmarks and metrics to measure frontier model capabilities on threat-relevant tasks.
Build research infrastructure and evaluation methods to assess model behaviour under monitoring protocols.
Create maintainable, scalable systems and lead projects from ideation to delivery.
Contribute rigorous research science through literature knowledge and problem-solving on open-ended challenges.

Model Evaluation and Threat Research (formerly Alignment Research Center, Evaluations) is a project focused on evaluating the capabilities and alignment of advanced ML models.

Applications are handled by the employer.

AI Safety Careers does not process applications directly.

Member of Technical Staff, Research