Konecta is a leading innovative global service provider in customer management
business process and digital outsourcing, with 120,000 passionate employees working in
30 languages across 4 continents and 26 countries.
Focusing on the unique needs and opportunities of each industry, Konecta offers a full
range of end-to-end customer management solutions – including acquisition, retention,
customer service, technical support, and collection – all based on a sustainable business
model. These services are built on a portfolio of world-class expertise covering customer
experience and process management, digital solutions and cutting-edge technologies.
Headquartered in Madrid, Konecta delivers global revenues of €2 billion with more than
500 clients, covering some of the biggest names in telecoms, energy, banking, mobility,
retail, and e-commerce.
Mission of the role
As an observability engineer, you will design and implement the monitoring, alerting, and dashboarding infrastructure that gives teams visibility into platform health, use case performance, and operational costs.
Our GenAI platform requires comprehensive observability to ensure production
reliability, performance optimisation, and cost management.
Responsibilities
* Design and implement observability architecture using Prometheus and Grafana
* Deploy and manage the Prometheus stack on GKE with appropriate retention and high availability configuration
* Create comprehensive Grafana dashboards for platform health, API performance, and use case metrics
* Implement custom metrics collection for CrewAI agents, Kong API gateway, and LLM usage
* Configure OpenTelemetry instrumentation across all platform services
* Design alerting rules and notification channels for P0–P3 incident severity levels
* Build cost and usage dashboards for LLM token consumption and infrastructure spend
* Integrate with Cloud Monitoring and Cloud Logging for unified observability
* Establish SLI/SLO frameworks for platform and use case services
* Create runbooks for common alerting scenarios and incident response
Requirements
* 4+ years of experience in observability and monitoring engineering
* Strong expertise in Prometheus (PromQL, recording rules, alerting rules)
* Proficiency in Grafana (dashboard design, variables, annotations, alerting)
* Experience with OpenTelemetry for distributed tracing and metrics
* Knowledge of Kubernetes monitoring patterns and kube-state-metrics
* Understanding of SRE principles (SLIs, SLOs, error budgets)
* Experience with log aggregation and analysis (Loki, ELK, or similar)
* Familiarity with alerting best practices and on-call workflows
* Experience with GCP Cloud Monitoring and Cloud Trace integration
* Knowledge of AI/ML observability patterns (model latency, token usage, drift detection)
* Background in API gateway monitoring (Kong, Envoy, or similar)
* Experience with long-term Prometheus storage
* Familiarity with FinOps and cost observability dashboards
#J-18808-Ljbffr