Our client is a fast-growing deep-tech company founded in 2019 and recognized by CB Insights as one of the 100 most promising AI companies globally. They are the largest quantum software company in the EU, with 250+ employees worldwide and growing, delivering advanced solutions trusted by leading global enterprises across several critical industries, including finance, energy, manufacturing, telecom, and industrial sectors. Job Overview They are seeking a MLOps Engineer to steer the technical vision of their Training and Inference Optimization team. In this high-impact role, you will architect the infrastructure that powers our next-generation AI models. You will bridge the gap between systems programming and machine learning, optimizing large-scale LLM training via NVIDIA NeMo and building ultra-high-throughput serving systems using vLLM, TensorRT-LLM, and SGLang. Your mission is to ensure our models are not only state-of-the-art but also production-hardened, cost-efficient, and performant at scale. Perks and Benefits: Indefinite contract. Equal pay guaranteed. Variable performance bonus. Signing bonus. They offer work visa sponsorship (If applicable) and relocation package (if applicable). Private health insurance. Eligibility for educational budget according to internal policy. Hybrid opportunity. Versátil working hours. Language classes and discounted lunch options. A high-performance, collaborative environment, operating at pace on cutting-edge technologies. Career plan. Opportunity to learn and teach. Required Qualifications: Experience: 5+ years in MLOps, DevOps, or Software Engineering, with a minimum of 2 years dedicated to LLM infrastructure. Deep Learning Ecosystem: Expert-level proficiency with PyTorch and the NVIDIA stack (CUDA, NCCL, Triton ). Specialized Tooling: Hands-on experience with NVIDIA NeMo (or Megatron-Bridge) for distributed training and at least two of the following for serving: vLLM, TensorRT-LLM, or SGLang. Orchestration & Lifecycle: Proven experience with SLURM/Flyte/Ray/SkyPilot for cluster management and MLflow (or similar tool) for experiment and model management. Infrastructure: Deep expertise in Kubernetes and K8s operators (e.G., KubeRay, MPI Operator, or Run:ai). Systems Programming: Mastery of Python and a functional understanding of C++ or Rust for performance-critical components. Next-Gen Hardware: Familiarity with high-performance networking (InfiniBand/RoCE ) and NVIDIA H200/B200 (Blackwell) architectures. Preferred Qualifications: Active contributions to relevant open-source projects (vLLM, SGLang, SkyPilot, or NeMo ). Proven track record with model compression (Sparsity, Distillation, or Quantization). Experience writing or optimizing custom Triton kernels. Expertise in ML observability stacks (Prometheus, Grafana, Jaeger).