Senior Site Reliability Engineer
We are looking for a Senior Site Reliability Engineer to join our Infrastructure Team in the Platform Domain. Our mission is to keep the Factorial application running 24/7, ensure its performance, scalability, and security, and research improvements to enable Product Engineers to develop new use cases adding value to our customers.
The Engineering team at large at Factorial consists of 200+ Product Engineers. We look for talented people who are curious, proactive, and have great communication skills.
Responsibilities
* Monitor and troubleshoot large-scale application deployments on the cloud.
* Use Terraform or Ansible for Infrastructure as Code.
* Manage monitoring and observability with Datadog, Grafana, or OpenTelemetry.
* Operate in public cloud services such as AWS or Azure.
* Debug issues in a Kubernetes cluster.
* Manage a large MySQL cluster with HA configuration.
* Note: This is not a backend or frontend Software Engineering role.
Qualifications
* Solid experience monitoring and troubleshooting large-scale application deployments on the cloud.
* Experience with Terraform or Ansible for IaC.
* Familiarity with Datadog, Grafana, or OpenTelemetry.
* Knowledge of public cloud services like AWS or Azure.
* Proficiency in debugging Kubernetes cluster issues.
* Experience managing a MySQL cluster with HA configuration.
How We Work
We work on-site several days a week in Barcelona or Madrid, while also supporting remote work when it makes sense. This balance helps us stay agile, creative, and closely connected as a team.
#J-18808-Ljbffr