Site reliability engineer, technical referent

Madrid

Indefinido

dLocal

Publicada el 30 octubre

Descripción

Site Reliability Engineer, Technical Referent Join dLocal, a global payments platform operating in 40 emerging markets, to help our merchants expand worldwide. We're building mission‐critical observability for customers like Netflix, Amazon, Nike, and Facebook, and we need an experienced Site Reliability Engineer to drive the design, implementation, and maintenance of our centralized observability platform based on OpenTelemetry.
What will you do? Own OpenTelemetry pipelines: design, implement, and maintain scalable logs, metrics, and trace ingestion while optimizing cost and performance.
Enable engineering teams with self‐service observability tooling and best‐practice adoption.
Support incident management by creating playbooks, checklists, and automations.
Collaborate across business units to translate monitoring, alerting, and SLO/SLA requirements into resilient systems.
Automate observability infrastructure using IaC, provisioning monitoring tools and alerting rules.
Define baseline observability standards for all services and devices.
Own technical and security health, ensuring availability and security KPIs are met.
Continuously refine alerting signals to reduce noise and improve incident response.
Which skill do you need? 4+ years as an SRE or similar observability‐focused role.
Deep experience with Kubernetes and its monitoring practices.
Hands‐on OpenTelemetry: collectors, instrumentation, pipeline optimization.
Proficiency with Grafana, Prometheus, Loki, New Relic, or Datadog.
IaC tools (Terraform) and GitOps CI/CD (ArgoCD, GitHub Actions) experience.
Incident management platform integration (PagerDuty, Jira).
Scripting in Python, Go, or equivalent for observability automation.
Strong problem‐solving and cross‐functional collaboration.
You will stand out if you have: Cloud experience, especially AWS and ECS‐based workloads.
Observability pipeline experience at scale in high‐throughput environments.
Configuration‐as‐Code tools (Ansible, Chef, SaltStack) for legacy instances.
Database performance monitoring in large‐scale distributed environments.
What do we offer? Remote‐first work that can be done anywhere or from one of our global offices.
Flexible schedules driven by performance.
Fintech industry: dynamic, ever‐evolving environment with growth opportunities.
Referral bonus program for internal talent recommendation.
Learning & development: Premium Coursera subscription.
Language classes: free English, Spanish, or Portuguese.
Social budget for team connection and chill out.
Housing stipend for coworking with the team in any city.
Location Greater Madrid Metropolitan Area – remote‐first opportunity.
Application process Submit your CV and we will review your qualifications. Our Talent Acquisition team will notify you by email at each step of the process.

#J-18808-Ljbffr

Enviar

Crear una alerta

Guardar