Senior Data Scientist Location:
Remote from Spain (Spanish contract) Join a transformative data and AI platform initiative aimed at modernizing enterprise-scale capabilities and enabling real-time decision-making. This project delivers a comprehensive roadmap covering AI, MLOps, data governance, and platform scalability, supporting a shift towards data-first operations and intelligent automation.
Requirements:
4+ years of experience as a Data Scientist, with deep expertise in unsupervised learning, clustering, and advanced exploratory data analysis. Strong hands-on experience with SHAP or similar model interpretability techniques. Proficiency in Python, Pandas, SQL, Jupyter, and common data manipulation and visualization tools. Familiarity with AWS ecosystem tools like S3, RDS, IAM, and BI solutions such as QuickSight. Experience designing and building GenAI or LLM-based workflows, including prompt engineering and integrating APIs. Ability to benchmark different LLM solutions and assess their performance for specific summarization and recommendation use cases. Skilled in transforming raw outputs into compelling, business-relevant insights for both technical and non-technical audiences. Nice to have Experience implementing RAG pipelines with vector databases and domain document ingestion. Exposure to MLOps workflows and tooling (e.G. MLflow, SageMaker, Airflow, Terraform). Prior work on integrating BI platforms with AI/ML pipelines. Background in identity verification
Responsibilities:
Drive the development and evolution of customer clustering models using unsupervised learning to identify patterns in pass rate performance and flag inconsistencies. Lead SHAP-based explainability initiatives to uncover the root causes behind verification failures and create dynamic, on-demand explanations. Conduct benchmarking of LLM APIs, assessing summarization quality, latency, relevance, and cost to inform GenAI solution design. Collaborate on pipeline development to extract, preprocess, and format QuickSight reports for GenAI consumption. Build and test proof-of-concept RAG pipelines that enhance LLMs with domain-specific context from historical documents and verification reports. Work closely with Delivery Managers to translate complex analytics and model outputs into business-friendly visualizations and narratives. Continuously refine clustering methodology by evaluating alternative models, tuning hyperparameters, and expanding criteria. Partner with MLOps engineers to ensure seamless integration of data science pipelines into the broader infrastructure, with a focus on automation and scalability.