AI QA Engineer An exciting opportunity has opened with an innovative AI-focused technology team working on next-generation AI and machine learning solutions at enterprise scale. We're looking for an experienced AI QA Engineer to help drive quality across cutting-edge AI products, including LLM-powered applications, multi-agent systems, RAG pipelines, and cloud-native AI platforms. This is a hands-on role suited to someone passionate about AI quality assurance, automation, and testing modern AI architectures in a fast-moving Agile environment. What You'll Be Doing
- Designing and maintaining automated test frameworks for AI/ML applications and LLM-powered systems
- Testing RAG pipelines, semantic search functionality, and vector database implementations
- Building automated testing for data and ETL pipelines
- Validating AI APIs, backend services, and cloud-native AI workloads
- Developing evaluation frameworks for LLM outputs and non-deterministic AI behaviours
- Working closely with AI Engineers, Data Engineers, and Product teams to define quality standards
- Supporting performance, integration, regression, and functional testing across AI systems
- Contributing to QA best practices within a collaborative Scrum environment
Tech Environment
- Python
- Playwright
- Cucumber
- FastAPI
- LangChain / LangGraph
- CrewAI
- AWS (Bedrock, SageMaker, Lambda, S3, Aurora)
- Kubernetes & Docker
- Vector Databases (OpenSearch, Pinecone, pgvector)
- RAG Evaluation Tools (RAGAS, Hugging Face, LangSmith)
- GitHub Actions / CI/CD
What We're Looking For
- 3+ years in QA / Test Engineering
- Commercial experience testing AI/ML or LLM-based applications
- Strong automation testing experience with Python
- Experience with Playwright and modern testing frameworks
- Understanding of prompt engineering and LLM evaluation methodologies
- Experience testing APIs and cloud-native platforms
- Exposure to Kubernetes, Docker, and AWS services
- Strong communication skills and a collaborative mindset
Nice to Have
- Experience with multi-agent AI systems
- MLOps exposure (MLflow, W&B;, CI/CD for ML)
- Observability tooling such as Langfuse or Datadog
- Performance testing experience (k6, Locust, JMeter)
- Terraform or Infrastructure-as-Code exposure
Location
- Barcelona, Spain
- Hybrid working model (2–3 days onsite)
Interested? xhfqzwm If you're excited by the challenge of testing and validating next-generation AI systems in a rapidly evolving environment, apply now or reach out directly for a confidential conversation.