Experteer Overview In this role, you drive production-ready AI across a large digital-services ecosystem. You set technical direction for multi-agent AI systems, oversee model development and deployment, and ensure observability and reliability at scale. You collaborate with cross-functional teams to translate business problems into ML solutions, shaping the AI platform used by millions of SMBs. This is a hands-on leadership role that blends research, engineering, and production responsibilities to deliver real impact.Compensaciones / Beneficios - Architect and evolve a multi-agent orchestration platform (Hermes/Multica) with plugin systems and observability hooks - Design voice AI pipelines with low latency end-to-end targets and telephony integration - Build and maintain RAG pipelines with quality measurement over vector and keyword indexes - Define MCP server architecture and tool-use contracts for internal and external integrations - Fine-tune and evaluate LLMs (LoRA, QLoRA, DPO) for domain-specific tasks;
manage model lifecycle - Own AI observability stack (Langfuse tracing, LLM instrumentation, cost tracking, quality alerts) and enforce guardrails (PII redaction, safety scanning) - Develop data ingestion, preprocessing and feature pipelines;
drive ML CI/CD with automated eval gating and canary releases - Set architectural standards, conduct design reviews, mentor engineers, and collaborate with Product to translate business problems into ML problems - Engage with external research partners to identify production-ready signals and open-source opportunitiesResponsabilidades - 8+ years in ML Engineering, Applied AI, or Research Engineering with leadership experience - Deep production experience with LLMs: fine-tuning, RLHF/DPO, prompt engineering, RAG, tool use - Proficiency in Python and core ML stack: PyTorch, Transformers (HuggingFace), PEFT/LoRA - Hands-on experience with LLM inference serving in latency-sensitive environments (vLLM, TensorRT-LLM, TGI) - Practical knowledge of agentic frameworks: multi-agent coordination, tool-use orchestration, observability - Experience with speech AI or real-time audio systems is a strong plus - Solid MLOps knowledge: experiment tracking (MLflow/Wu0026B), model registries, Docker/Kubernetes, ML CI/CD - Awareness of LLM risks (hallucination, data leakage, privacy) and mitigation strategies - Strong communication skills for design docs, architecture reviews, and stakeholder explainabilityRequisitos principales -