Job description

Transluce • San Francisco Bay Area

San Francisco Bay Area

$250,000 - $500,000

In this role, you'll develop and train scalable interpretability assistants that detect unexpected behaviours from AI models' activations.
Create diverse evaluations ranging in difficulty to identify undesirable behaviours in open-source models.
Develop novel architectures and objectives for training advanced interpretability assistants.
Scale training and inference pipelines to support up to 1T-parameter models.
Collaborate with researchers to advance end-to-end AI oversight capabilities.

Transluce is a research lab that builds technology for understanding AI systems and steering them in the public interest.

Applications are handled by the employer.

AI Safety Careers does not process applications directly.

Research Engineer, Scalable Interpretability