AI Platform / MLOps Engineer
About the role
We are looking for an AI Platform / MLOps Engineer to join a fast-growing AI team within an international technology environment.
In this role, you will be responsible for operating, scaling, and improving AI/ML systems in production, ensuring that training, inference, pipelines, and platform services are reliable, observable, secure, and cost-efficient.
You will work at the intersection of MLOps, DevOps, Cloud Engineering, and AI platform architecture, supporting the full lifecycle of AI systems — from model training environments to production inference, CI/CD automation, monitoring, and cost optimisation.
This is a hands-on role for someone with a strong platform engineering mindset, solid experience in AWS, infrastructure, automation, and ML tooling, and a passion for building production-grade AI systems.
If you enjoy making AI systems scalable, reliable, observable, and ready for real-world usage — this could be a great fit.
What you'll do
Operate and scale AI/ML platforms end-to-end, including training, inference, pipelines, and production environments
Build and maintain robust ML infrastructure using tools such as AWS SageMaker, MLflow, feature stores, and related ML platform components
Design and implement CI/CD pipelines for ML models, AI workloads, and platform services
Set up and optimise training and inference environments for reliability, scalability, and performance
Implement observability, monitoring, alerting, and cost-control mechanisms for AI workloads
Support production deployments of ML/AI systems with a strong focus on automation and operational excellence
Work with DevOps and platform tooling such as AWS, Terraform, Kubernetes, Docker, GitHub Actions / CI/CD tools
Collaborate with AI Engineers, Data Scientists, Data Engineers, and Tech Leads to ensure AI solutions are production-ready
Contribute to best practices around MLOps, model versioning, experiment tracking, deployment, monitoring, and governance
Work with LLM and agentic tooling ecosystems such as LangChain, LangFuse, LangSmith, or similar platforms
Troubleshoot production issues related to infrastructure, pipelines, inference performance, latency, reliability, and cost
Must Have
Solid background in Platform Engineering, DevOps, Cloud Engineering, MLOps, or ML Platform Engineering
Hands-on experience with AWS and cloud-native services
Experience with Infrastructure as Code, especially Terraform
Strong experience building and maintaining CI/CD pipelines
Experience with ML platform tooling such as SageMaker, MLflow, feature stores, or similar tools
Understanding of ML/AI workflows: training, inference, model deployment, pipelines, monitoring, and lifecycle management
Experience setting up and managing production environments for AI/ML workloads
Strong understanding of observability, monitoring, alerting, scalability, and cost optimisation
Familiarity with containerisation and orchestration tools such as Docker and Kubernetes
Experience with LLM / agentic tooling such as LangChain, LangFuse, LangSmith, or similar frameworks/platforms
Strong automation mindset and ability to build reliable, repeatable, production-grade systems
Strong problem-solving skills and ownership mindset
Fluent English and Spanish
Nice to Have
Experience with data pipelines or data engineering workflows
Experience with AWS Bedrock, vector databases, or LLM infrastructure
Experience with model monitoring, drift detection, evaluation pipelines, or AI observability platforms
Experience with workflow orchestration tools such as Airflow, Prefect, or similar
Knowledge of security, governance, and compliance practices for AI/ML platforms
Experience working in Agile / Scrum environments
Previous experience in travel, aviation, digital platforms, or large-scale enterprise environments
Hybrid model - 2 days onsite per week
Why join this project?