AI Safety & Alignment
Research Scientist, Safety Post-Training
San Francisco Bay Area, New York, NY
$216,000 - $270,000
-
In this role, you'll develop and apply post-training methods and interpretability techniques to make frontier AI systems safer.
Research Scientist, Safety Post-Training
Scale AI · Added today
Applications are handled by the employer on an external website. AI Safety Careers does not process applications directly.
AI Safety & Alignment
San Francisco Bay Area, New York, NY
$216,000 - $270,000
In this role, you'll develop and apply post-training methods and interpretability techniques to make frontier AI systems safer.
Design and run post-training pipelines to study how training choices affect model safety, robustness, and alignment.
Develop interpretability-informed evaluations that reveal unsafe model behaviours and guide targeted mitigations.
Collaborate with policymakers, engineers, and researchers to translate findings into actionable safety standards.
Evaluate post-trained models for failure modes including reward hacking, sycophancy, and alignment faking.
Scale AI is a company that provides training data for machine learning teams, as well as various other services including testing and evaluation. They work with generative AI companies, the US government, and various enterprises.
This listing may be aggregated from a public source or submitted by a third party. If you represent this employer and would like to update or remove this listing, contact support@aisafetycareers.com.