Empleo
Mis anuncios
Mis alertas
Conectarse
Encontrar un trabajo Consejos empleo Fichas empresas
Buscar

Service reliability lead

Puerto Rico
SPD Technology
Publicada el 11 abril
Descripción

At SPD Technology, we bring together a team of like‐minded people who are driven by the desire to bring value through their work, united in their commitment to high performance and delivering custom, cutting‐edge tech solutions that drive clients' growth. We empower our people with a culture of excellence and enable them with the opportunity to uphold their accountability to contribute on each level. We value humanity and collaboration, encourage professional and personal growth, and foster a supportive and flexible work environment where everyone's contribution is welcomed.We are looking for a Service Reliability Lead to join us as part of our team.About the role The Service Reliability Lead is the single technical owner accountable for the operational health, SLA compliance, and continuous improvement of the Utlx payment orchestration platform.This role combines Site Reliability Engineering discipline with managed‐service governance. The lead owns incident response end‐to‐end (from P1 triage to RCA delivery), builds and operates the monitoring and alerting stack, manages SLA measurement and penalty mechanics, and serves as the senior technical escalation point for the client.The Service Reliability Lead reports to the Project Manager, who manages the support team as a whole (including engineers). The PM owns scheduling, administrative client communications, resource coordination, and project governance. The Service Reliability Lead operates as the technical authority within the team: a player‐coach who directs engineering work, makes real‐time incident decisions, sets technical standards, and participates in hands‐on work alongside the engineers. Both the lead and the engineers report to the PM; the lead's authority over the team is technical, not managerial.About the project You will work on a Payment Orchestration Platform, a greenfield project designed to optimize transaction processing, enhance operational efficiency, and deliver a seamless user experience. As part of this project, you will have the opportunity to influence its architecture and technical decisions.The support team consists of the following roles Project Manager, Service Reliability Lead, Support / DevOps Engineers (3–4)Work within the EU time zone (UTC+1/UTC+2), which is 2 hours behind Ukraine.Incident Management and On‐Call Own the L2/L3/L4 escalation path: serve as the senior technical point of contact for all incidents, and coordinate with third‐party vendors (AWS, payment gateways, infrastructure providers) when an external root cause is identifiedEnsure incident acknowledgement and resolution in line with SLA targets across all priority levelsMake real‐time decisions on hotfixes, rollbacks, and configuration changes under pressureBuild and maintain the on‐call rotation; ensure zero coverage gapsManage workarounds through to permanent resolution and maintain the escalation matrix for the clientObservability and Monitoring Deliver an operational monitoring dashboard (CloudWatch / Grafana)Configure PagerDuty for automated alerting and on‐call escalation aligned to SLA targetsMaintain instrumentation across availability, latency, and error rate metrics per service tierSLA and Penalty Governance Instrument and validate SLA clocks across response, workaround, and resolution targetsPrepare monthly service credit calculations and service performance reportsProvide metrics evidence during any client dispute reviewDeliver monthly reports covering incident volumes, SLA performance, RCA status, and risk logRCA and Service Improvement Author Root Cause Analysis documents within 5 days of incident resolutionIdentify recurring patterns and monitor for Service Improvement Plan triggersDesign and implement SIPs with corrective actions, owners, and delivery timelinesProactively reduce incident frequency and improve mean time to resolutionInfrastructure and AWS Operations Operate in line with the AWS Shared Responsibility ModelDistinguish SPD‐caused from third‐party failures; maintain evidence for availability exclusion claimsCoordinate planned and urgent maintenance windows with the clientWe're looking for you if you have 5–8 years in production operations / SREHands‐on incident command experienceMonitoring stack: Grafana, CloudWatch, PagerDutyRCA authorship and structured problem‐solvingSLA management and service credit mechanicsExperience with hyper‐care / go‐live stabilisation periodsExperience in fintech or payment systemsWhat's in it for You Reveal great tech solutionsJoin the team of experts who create custom, cutting‐edge tech solutions for world‐renowned businesses, fueling client growth. Unleash your potential, tackle new challenges, and be part of a team that values your skills and contributions. Focus on long‐term impact and building tailored, long‐lasting partnerships with our clients.Experience an agile and flexible working environmentEnjoy the freedom of fully remote work with a flexible working schedule. Empower yourself with a stable workload and a stable income, supported by provided laptops and licensed software. We focus on lasting cooperation and unite result‐oriented individuals who are on a high‐performance approach to work.Embrace the opportunity for personal and professional growthBenefit from performance and merit reviews, elevate your skills with personal development plans, and personal learnings through the corporate library, public speaking support, and more.Be among like‐minded peopleWork with a team of one mind who cares about what they do and how they do. Collaborate with top‐notch experts who are always ready to help and support you through any challenges. Join company‐wide tech and cultural events, and contribute to meaningful CSR initiatives that resonate with your values. Feel supported by your HR, and take advantage of our referral bonus program.Interview steps Pre‐Screening with the recruiter (30 min)

#J-18808-Ljbffr

Enviar
Crear una alerta
Alerta activada
Guardada
Guardar
Ofertas cercanas
Empleo Puerto Rico
Empleo Provincia de Las Palmas
Empleo Canarias
Inicio > Empleo > Service Reliability Lead

Jobijoba

  • Dosieres empleo
  • Opiniones Empresas

Encuentra empleo

  • Ofertas de empleo por profesiones
  • Búsqueda de empleo por sector
  • Empleos por empresas
  • Empleos para localidad

Contacto/ Colaboraciones

  • Contacto
  • Publiquen sus ofertas en Jobijoba

Menciones legales - Condiciones legales y términos de Uso - Política de Privacidad - Gestionar mis cookies - Accesibilidad: No conforme

© 2026 Jobijoba - Todos los Derechos Reservados

Enviar
Crear una alerta
Alerta activada
Guardada
Guardar