Job Description:
• Red-team AI models and agents by testing jailbreak attempts, prompt injections, misuse scenarios, and exploit strategies
• Generate high-quality human evaluation data by annotating model failures, classifying vulnerabilities, and identifying systemic risks
• Apply structured testing methodologies using taxonomies, benchmarks, and playbooks to ensure consistent evaluation
• Document findings clearly and reproducibly, producing reports, datasets, and adversarial test cases that teams can act upon
• Work across multiple projects, supporting different AI systems and evaluation objectives
Requirements:
• You have **prior red-teaming experience**, such as adversarial AI testing, cybersecurity, or socio-technical risk analysis
• You naturally think **adversarially**, exploring ways to push systems to their limits and uncover weaknesses
• You prefer **structured methodologies**, using frameworks and benchmarks rather than ad-hoc testing
• You communicate risks and vulnerabilities **clearly to both technical and non-technical audiences**
• You are comfortable **working across multiple projects and adapting to new evaluation challenges**Nice-to-Have Specialties- **Adversarial Machine Learning:** jailbreak datasets, prompt injection attacks, RLHF/DPO vulnerabilities, or model extraction techniques- **Cybersecurity:** penetration testing, exploit development, reverse engineering- **Socio-technical risk analysis:** harassment or misinformation testing, abuse pattern analysis- **Creative adversarial thinking:** backgrounds in psychology, acting, writing, or other disciplines that support unconventional attack strategies
Benefits: