The Red Hat OpenShift Dedicated Site Reliability Engineering (SRE) team is looking for a Senior Software Engineer to join our global team. In this role, you will work on Red Hat OpenShift, which is enterprise Kubernetes, as part of a team that develops and operates Red Hat OpenShift Dedicated, a public cloud service based on Red Hat OpenShift for large enterprise customers. You’ll play a key role in contributing to solutions that make Red Hat OpenShift Dedicated scalable, featureful, resilient, and secure while maintaining a balance between development and operations work.
You’ll contribute to the design and development of automation software to provision, upgrade, monitor, and heal a large global fleet of Red Hat OpenShift clusters deployed across multiple public clouds. You'll participate in a global on-call rotation and help lead incident management, root cause analysis, and continuous improvement activities, managing engineering efforts against a service-level agreement (SLA) and error budget.
OpenShift SRE is a sophisticated, global, fast-paced team inside the world's open source leader with constant opportunities to learn new skills and innovate new solutions to meet our customers' demands. As a Software Engineer on this team, you will directly contribute to Red Hat's success in the rapidly growing Kubernetes as a Service (KaaS) market.
What You Will Do
* Design and write automation software to provision, upgrade, monitor, and heal a large global fleet of Red Hat OpenShift clusters deployed across multiple public clouds.
* Identify single points of failure and other high-risk architecture issues; propose and implement more resilient resolutions.
* Participate in the release cycles of our offerings, deploying code to integration, staging, and production environments, integrating with CI/CD tooling, monitoring, and change management.
* Perform software updates, peer code reviews, testing, and CVE analysis.
* Respond to security threats.
* Interact with automated monitoring and healing infrastructure to ensure healthy environments.
* Provide engineering support to Red Hat's global technical support team to resolve customer issues.
* Create and maintain SOPs for maintenance tasks, configuration changes, and problem remediation.
* Participate in a global on-call rotation, including periodic weekend and holiday duties.
What You Will Bring
* 3+ years of software engineering experience using object-oriented languages; Golang and Python preferred.
* Experience managing Linux-based systems in public clouds like AWS, GCP, or Azure.
* Commercial experience with enterprise system monitoring; Prometheus knowledge is a plus.
* Experience with container technology, Kubernetes, OpenShift, and configuration management tools (Ansible, Puppet, Chef) is a big plus.
* Strong troubleshooting skills and solid communication skills in English.
About Red Hat
Red Hat is the world’s leading provider of enterprise open source software solutions, using a community-powered approach to deliver high-performing Linux, cloud, container, and Kubernetes technologies. Spread across 40+ countries, our associates work flexibly across work environments, from in-office, to office-flex, to fully remote. We promote an open and inclusive environment where everyone’s ideas are valued.
Inclusion at Red Hat
Our culture is built on transparency, collaboration, and inclusion, empowering people from diverse backgrounds to share ideas and drive innovation. We are committed to providing equal opportunity and access to all applicants.
Employment Details
Position: Senior Site Reliability Engineer, Full-time, Industry: Software Development and IT Services, Location: Spain (Remote options available).
#J-18808-Ljbffr