Microsoft is dedicated to advancing post-training methods for AI models and is seeking a highly skilled AI Data & Training Technical Staff to join their team. In this role, you will be involved in creating world-class datasets, training models, and developing scalable data pipelines that impact cutting-edge language and multimodal models.
Responsibilities
- Design & Evaluate Datasets – Build high-quality datasets and benchmarks for training AI models; run ablation studies to measure impact and optimize data effectiveness
- Advance Model Training – Apply deep expertise in pre-training, post-training, and reinforcement learning (RL) for both language and multimodal models
- Develop Data Infrastructure – Create and maintain scalable pipelines for ingestion, preprocessing, filtering, and annotation of large, complex datasets
- Data Quality & Analysis – Assess real-world multimodal datasets (text, image, video, audio, code) for quality, diversity, and relevance; identify gaps and propose improvements
- Tooling & Workflows – Build lightweight tools for dataset auditing, visualization, and versioning to streamline experimentation
- Research & Innovation – Collaborate with cross-functional teams to push research and product boundaries, delivering models that make a real-world impact
Skills
- Bachelor's Degree (complete or in progress) in relevant field AND 3+ months related research internship experience OR Master's Degree in relevant field OR equivalent experience
- Software engineering skills with fluency in Python and modern data libraries
- The ability to meet Microsoft, customer and/or government security screening requirements are required for this role
- Master's Degree in relevant field AND 1+ year(s) related research experience OR equivalent experience
- Coding expertise in Python and data libraries (Pandas, NumPy, etc.)
- Proficiency with distributed data frameworks (Spark, Ray, Apache Beam) and cloud ecosystems (Azure, data lakes)
- Hands-on experience with large-scale, unstructured or semi-structured datasets: images, video, audio, and code
- Proven experience training AI models at significant scale
- Demonstrated ability to collaborate within interdisciplinary teams and communicate complex, multimodal research concepts effectively
Company Overview
- Microsoft is a software corporation that develops, manufactures, licenses, supports, and sells a range of software products and services. It was founded in 1975, and is headquartered in Redmond, Washington, USA, with a workforce of 10001+ employees. Its website is https://www.microsoft.com.
Company H1B Sponsorship
- Microsoft has a track record of offering H1B sponsorships, with 1317 in 2026, 9192 in 2025, 9343 in 2024, 7677 in 2023, 11403 in 2022, 7210 in 2021, 7852 in 2020. Please note that this does not guarantee sponsorship for this specific role.