Software Engineer, Research Infrastructure - Remote Job
Share
Location: Remote (U.S. based)
Team: Fleet Infrastructure
Department: Applied AI Infrastructure
Reports to: Manager, Fleet Infrastructure
Job Type: Full-time
About the Role
OpenAI is seeking a Software Engineer to join our Fleet Infrastructure team. In this role, you will design, deploy, and maintain the infrastructure systems that power model training and deployment on one of the world’s largest GPU fleets. This position provides the opportunity to work at the cutting edge of AI research and deployment, ensuring systems are scalable, efficient, and highly reliable.
Responsibilities
-
Infrastructure Design & Implementation: Develop and operate components of our compute fleet, including job scheduling, cluster management, snapshot delivery, and CI/CD systems.
-
Collaboration: Work closely with researchers and product teams to understand workload requirements and translate them into scalable infrastructure solutions.
-
System Optimization: Enhance system performance and reliability by building automation tools for Kubernetes cluster provisioning and upgrades.
-
Snapshot Delivery: Ensure fast model startup times through high-performance snapshot delivery across blob storage down to hardware caching.
-
Cross-functional Partnership: Collaborate with hardware, infrastructure, and business teams to provide high utilization and high reliability services.
Qualifications
Required:
-
Experience with hyperscale compute systems.
-
Strong programming skills in languages such as Python, Go, or C++.
-
Experience working in public cloud environments, especially Azure.
-
Proficiency with Kubernetes and containerized workloads.
-
Execution-focused mentality with a rigorous attention to user requirements.
Preferred:
-
Understanding of AI/ML workloads and model deployment pipelines.
-
Familiarity with distributed storage and networking at scale.
About OpenAI
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of AI capabilities while deploying systems safely and responsibly. Our work is rooted in collaboration, safety, and the diverse perspectives that make AI beneficial for everyone.
We are an equal opportunity employer and do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other legally protected characteristics.
Why Join Us?
-
Impact: Contribute to infrastructure that powers groundbreaking AI research and applications.
-
Innovation: Work with experts pushing the boundaries of AI capabilities.
-
Growth: Opportunities for professional development and career advancement.
-
Culture: Join a collaborative, inclusive, and supportive work environment.
If you are passionate about building scalable, reliable infrastructure and want to contribute to the future of AI, this is the role for you.