Ai platform engineer

L'Ametlla del Vallès

Capitole

Publicada el Publicado hace 19 hr horas

Descripción

AI Platform / MLOps Engineer

About the role

We are looking for an AI Platform / MLOps Engineer to join a fast-growing AI team within an international technology environment.

In this role, you will be responsible for operating, scaling, and improving AI/ML systems in production, ensuring that training, inference, pipelines, and platform services are reliable, observable, secure, and cost-efficient.

You will work at the intersection of MLOps, DevOps, Cloud Engineering, and AI platform architecture, supporting the full lifecycle of AI systems — from model training environments to production inference, CI/CD automation, monitoring, and cost optimisation.

This is a hands-on role for someone with a strong platform engineering mindset, solid experience in AWS, infrastructure, automation, and ML tooling, and a passion for building production-grade AI systems.

If you enjoy making AI systems scalable, reliable, observable, and ready for real-world usage — this could be a great fit.

What you'll do

Operate and scale AI/ML platforms end-to-end, including training, inference, pipelines, and production environments

Build and maintain robust ML infrastructure using tools such as AWS SageMaker, MLflow, feature stores, and related ML platform components

Design and implement CI/CD pipelines for ML models, AI workloads, and platform services

Set up and optimise training and inference environments for reliability, scalability, and performance

Implement observability, monitoring, alerting, and cost-control mechanisms for AI workloads

Support production deployments of ML/AI systems with a strong focus on automation and operational excellence

Work with DevOps and platform tooling such as AWS, Terraform, Kubernetes, Docker, GitHub Actions / CI/CD tools

Collaborate with AI Engineers, Data Scientists, Data Engineers, and Tech Leads to ensure AI solutions are production-ready

Contribute to best practices around MLOps, model versioning, experiment tracking, deployment, monitoring, and governance

Work with LLM and agentic tooling ecosystems such as LangChain, LangFuse, LangSmith, or similar platforms

Troubleshoot production issues related to infrastructure, pipelines, inference performance, latency, reliability, and cost

Must Have

Solid background in Platform Engineering, DevOps, Cloud Engineering, MLOps, or ML Platform Engineering

Hands-on experience with AWS and cloud-native services

Experience with Infrastructure as Code, especially Terraform

Strong experience building and maintaining CI/CD pipelines

Experience with ML platform tooling such as SageMaker, MLflow, feature stores, or similar tools

Understanding of ML/AI workflows: training, inference, model deployment, pipelines, monitoring, and lifecycle management

Experience setting up and managing production environments for AI/ML workloads

Strong understanding of observability, monitoring, alerting, scalability, and cost optimisation

Familiarity with containerisation and orchestration tools such as Docker and Kubernetes

Experience with LLM / agentic tooling such as LangChain, LangFuse, LangSmith, or similar frameworks/platforms

Strong automation mindset and ability to build reliable, repeatable, production-grade systems

Strong problem-solving skills and ownership mindset

Fluent English and Spanish

Nice to Have

Experience with data pipelines or data engineering workflows

Experience with AWS Bedrock, vector databases, or LLM infrastructure

Experience with model monitoring, drift detection, evaluation pipelines, or AI observability platforms

Experience with workflow orchestration tools such as Airflow, Prefect, or similar

Knowledge of security, governance, and compliance practices for AI/ML platforms

Experience working in Agile / Scrum environments

Previous experience in travel, aviation, digital platforms, or large-scale enterprise environments

Hybrid model - 2 days onsite per week

Why join this project?

Enviar

Crear una alerta

Guardar