Job description

Scale AI • San Francisco Bay Area, New York, NY

San Francisco Bay Area, New York, NY

$216,000 - $270,000

In this role, you'll develop and apply post-training methods and interpretability techniques to make frontier AI systems safer.
Design and run post-training pipelines to study how training choices affect model safety, robustness, and alignment.
Develop interpretability-informed evaluations that reveal unsafe model behaviours and guide targeted mitigations.
Collaborate with policymakers, engineers, and researchers to translate findings into actionable safety standards.
Evaluate post-trained models for failure modes including reward hacking, sycophancy, and alignment faking.

Scale AI is a company that provides training data for machine learning teams, as well as various other services including testing and evaluation. They work with generative AI companies, the US government, and various enterprises.

Applications are handled by the employer.

AI Safety Careers does not process applications directly.

Research Scientist, Safety Post-Training