Job description

Apollo Research • London or San Francisco

Develop threat models for coding agents under the assumption that they could be misaligned or compromised. This includes near-term threats like AI agents leaking private internal data and long-term threats like agents sabotaging safety research.
Treat coding agents through the lens of insider risk: they have credentials, access to code, network access, and the ability to execute arbitrary actions, just like a malicious insider would (see Control agenda ).
Map out kill chains and attack progressions similar to frameworks like MITRE ATT&CK, adapted for agentic AI. See e.g. the Agentic Loss-of-Control Threat Matrix for an example of a high-quality contribution.
Build and maintain our “coding agent security levels” which define what level of robustness Watcher provides against different categories of failure modes. These levels should be concrete, testable, and usable both internally (to guide product priorities) and externally (to communicate our security posture to customers).

Maintain our library of coding agent failure modes and keep it comprehensive, accurate, and current.

AI Security & Control Engineer