OpenAI
San Francisco, CA, USA
About the Team The Supercomputing Scheduling Pillar at OpenAI is dedicated to ensuring the reliability, scalability, and user-friendliness of job lifecycle management, with an emphasis on efficient and flexible job scheduling, quota management, and job execution workflows. We maximize researcher productivity by ensuring high goodput, efficient packing, and a consistent, ergonomic training workflow, while scaling to ever larger supercomputers while reducing operational burden to the team. About the Role As an engineer in the Scheduling Pillar, you will design, write, deploy, and operate job lifecycle management systems for model training on some of the largest supercomputers in the world. The scale is immense, the timelines are tight, and the organization is moving fast; this is an opportunity to shape a critical system in support of OpenAI's mission. This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation...