PbSenior MLOps Engineer /b /ppbr/ppWe are seeking a bSenior MLOps Engineer /b to steer the technical vision of our Training and Inference Optimization team. In this high-impact role, you will architect the infrastructure that powers our next-generation AI models. You will bridge the gap between systems programming and machine learning, optimizing large-scale LLM training via NVIDIA NeMo and building ultra high-throughput serving systems using vLLM, TensorRT-LLM, and SGLang. /ppbr/ppYour mission is to ensure our models are not only state-of-the-art but also production hardened, cost-efficient, and performant at scale. /ppbr/ppbr/ppbKey Responsibilities /b /ppbr/pp• bTraining Infrastructure: /b Architect and maintain scalable distributed training pipelines using NVIDIA NeMo/Nemotron/Megatron-Bridge. You will optimize GPU utilization, manage complex checkpointing strategies, and implement automated fault tolerance for long-running jobs. /pp• bInference Orchestration: /b Lead the deployment of LLMs using vLLM, TensorRT-LLM, or SGLang. You will implement and tune cutting-edge techniques - including PagedAttention, continuous batching, and advanced quantization (AWQ/FP8) to maximize throughput and minimize TPOT (Time Per Output Token). /pp• bWorkload Orchestration: /b Utilize SLURM/Flyte/Ray/SkyPilot to manage and scale ML workloads across diverse cloud providers and on-prem clusters, ensuring seamless resource shifting and cost-effective execution. /pp• bLifecycle Management: /b Standardize model tracking, versioning, and transition workflows using MLflow (or similar tool), ensuring reproducible training runs and a clear path from research to production. /pp• bPerformance Engineering: /b Conduct deep-dive profiling and bottleneck analysis across the full stack - from CUDA kernels and NCCL collective communications to Python-level orchestration. /pp• bEfficiency Cost Governance: /b Monitor and optimize cloud and on-prem GPU expenditures through intelligent scaling policies and high-density resource packing. /pp• bTechnical Leadership: /b Set the bar for engineering excellence. You will drive the roadmap, perform rigorous code reviews, and mentor junior and mid-level engineers. /ppbr/ppbr/ppbRequired Qualifications /b /ppbr/pp•bExperience /b: 5+ years in MLOps, DevOps, or Software Engineering, with a minimum of 2 years dedicated to LLM infrastructure. /pp• bDeep Learning Ecosystem /b: Expert-level proficiency with PyTorch and the NVIDIA stack (CUDA, NCCL, Triton). /pp• bSpecialized Tooling /b: Hands-on experience with NVIDIA NeMo (or Megatron-Bridge) for distributed training and at least two of the following for serving: vLLM, TensorRT-LLM, or SGLang. /pp• bOrchestration Lifecycle /b: Proven experience with SLURM/Flyte/Ray/SkyPilot for cluster management and MLflow (or similar tool) for experiment and model management. /pp• bInfrastructure /b: Deep expertise in Kubernetes and K8s operators (e.g., KubeRay, MPI Operator, or Run:ai). /pp•b Systems Programming /b: Mastery of Python and a functional understanding of C++ or Rust for performance-critical components. /pp• bNext-Gen Hardware /b: Familiarity with high-performance networking (InfiniBand/RoCE) and NVIDIA H200/B200 (Blackwell) architectures. /ppbr/ppBy applying to this role you understand that we may collect your personal data and store and process it on our systems. For more information please see our Privacy Notice ( /p