Senior Applied Research Engineer | Barcelona, Spain
¿Tiene las habilidades necesarias para este puesto? Lea todos los detalles a continuación y presente su candidatura hoy mismo.
We are partnered with a cutting-edge AI company shaping the future of enterprise decision-making. Founded by experienced technologists from leading research environments, the firm has developed a market-leading platform purpose-built for the structured data that underpins critical business decisions.
Backed by top-tier investors and trusted by some of the world's largest organisations, the company helps enterprises unlock significant value by enabling more accurate, forward-looking decision-making.
You will work on novel technical challenges in large-scale model development and contribute to technology that is changing how major organisations operate. This is an opportunity to join a category-defining company at an early stage and help shape its trajectory.
Barcelona, Spain, Hybrid working
Competitive Salary + Equity + Benefits + Relocation Support if needed
Permanent Role
Key responsibilities
* Profile end-to-end distributed training runs to identify bottlenecks across compute, GPU memory, and inter-GPU communication.
* Influence architectural decisions to improve efficiency and reliability of large-scale training jobs, including developing Triton/CUDA kernels when needed.
* Design and implement model scaling, parallelisation, and memory optimisation techniques for training workloads with very large context sizes.
* Collaborate closely with ML Researchers to diagnose architectural inefficiencies, ensure new research ideas scale efficiently in practice, and share internal knowledge on optimisation.
* Drive productionisation and serving of models from the research side, including improving inference efficiency via techniques such as quantisation.
Must have
* Strong understanding of modern ML architectures and large-scale training pipelines.
* Hands-on experience running distributed training jobs on multi-GPU systems.
* Advanced profiling and debugging across CPU, GPU, memory usage, latency, and inter-GPU communication.
* Strong programming skills in Python.
* Experience with model scaling and parallelisation strategies, including tensor and pipeline parallelism.
Highly Desirable
* Familiarity with NCCL, MPI, and distributed communication primitives.
* Knowledge of PyTorch and Triton internals.
* Programming experience with C++ and CUDA.
Benefits
* Competitive compensation with salary and equity and comprehensive benefits
* Relocation support for employees moving to join the team in an office location.
* A mission-driven, low-ego culture valuing diversity of thought, ownership, and bias towards action. xohynlm
If you are interested in this role, please respond directly to this advert with your updated CV or email it to