HPC Infrastructure Solutions Architect – GPU Platforms
\n
Descubra si esta posibilidad es adecuada para usted leyendo toda la información que sigue a continuación.
\n
Location
\n
Remote from anywhere in the EU
\n
Total compensation
\n
up to 250k TC
\n
Join an AI infrastructure team building the GPU, networking, and storage platforms underneath large-scale AI training workloads. This role focuses on the quality, scalability, reliability, and efficiency of the infrastructure before workloads arrive.
\n
While ML Specialists focus on models and workloads, this role owns the underlying platform that enables them to run at scale.
\n
Team & Responsibilities
\n
Work alongside senior infrastructure and AI engineers in a hands‑on, client‑facing role.
\n
You will
\n
Design and operate production‑grade GPU and HPC platforms for AI training and simulation
\n
Build and scale GPU clusters, with a strong focus on Slurm‑based scheduling
\n
Design and optimize high-performance networking using RDMA, InfiniBand, NVLink, and NVSwitch
\n
Design and tune storage and I/O paths for large-scale datasets
\n
Build cloud infrastructure using open-source tooling such as Kubernetes, Terraform, and Helm
\n
Required Skills
\n
Hands‑on experience building and operating GPU or HPC clusters
\n
Strong Linux, Kubernetes, networking, and storage background
\n
Deep understanding of HPC networking and RDMA stacks
\n
Experience with GPU schedulers, preferably Slurm
\n
Strong cloud experience, ideally multi‑cloud
\n
Experience with specific storage technologies is a plus, but strong storage and I/O expertise is required.
\n
This role is not a fit if your experience is limited to model development or high-level cloud architecture without deep GPU and networking exposure. xugodme We're looking for serious Seniors!