Description
Join our Team
About this opportunity
We are seeking a highly skilled and experienced Site Reliability Engineer to join our team. You will play a crucial role in designing, building, and maintaining the infrastructure that powers our products and services. This role is ideal for someone passionate about creating scalable and highly available systems and who enjoys working in a collaborative, fast-paced environment.
Your responsibilities will include developing and implementing best practices in site reliability engineering to ensure high system availability, scalability, and performance.
What you will do
* Design, develop, and maintain our platform infrastructure with a focus on high availability, scalability, and reliability.
* Collaborate with cross-functional teams to understand product requirements and provide technical guidance on infrastructure design and implementation.
* Build and maintain automated deployment, monitoring, alerting, and incident response systems.
* Participate in incident management, investigating and resolving production issues to minimize impact and ensure stability.
* Perform capacity planning and optimization to meet performance and scalability goals.
* Conduct regular system and performance analysis, identify areas for improvement, and implement solutions to enhance efficiency and stability.
* Troubleshoot and resolve complex system issues, including performance bottlenecks and infrastructure failures.
* Implement and uphold security best practices, ensuring compliance with industry standards and regulations.
* Work with software engineers to define and implement DevOps practices, CI/CD pipelines, and infrastructure-as-code approaches.
* Participate in on-call rotations to support the production environment, responding to and resolving incidents promptly.
What you will bring
* 8+ years of relevant experience as a Platform Engineer, SRE, or similar role, managing large-scale cloud environments and operational processes.
* Strong understanding of system architecture and networking, especially designing fault-tolerant, scalable systems.
* Extensive experience with automated build systems, particularly Jenkins and Spinnaker for CI/CD pipelines; experience with GitOps tools like ArgoCD and Flux is highly valued.
* Proven expertise in Infrastructure as Code, especially Terraform; experience with Azure Resource Manager, Google Cloud Deployment Manager, or AWS CloudFormation is a plus.
* Experience with IT automation tools, especially Ansible; Puppet experience is a plus.
* Proficiency in at least one programming language, with mandatory Python and scripting skills (Bash, PowerShell); knowledge of Go or JavaScript is appreciated.
* Hands-on experience with Kubernetes distributions such as SUSE RKE2, Red Hat OpenShift, VMware Tanzu, and containerized solutions; Docker is mandatory, containerd is a plus.
* Knowledge of databases, preferably NewSQL, and familiarity with NoSQL databases is advantageous.
* Deep understanding of Linux systems administration, troubleshooting, and performance tuning.
* Experience with monitoring and observability tools such as Prometheus, ELK, OpenTelemetry, and Grafana; Jaeger and Dynatrace are a plus.
* Excellent problem-solving, analytical, and troubleshooting skills.
* Strong communication and teamwork skills, with the ability to collaborate effectively across teams.
* Proficiency in English, both written and spoken.
Why join Ericsson?
At Ericsson, you'll have an outstanding opportunity to leverage your skills and imagination to push the boundaries of innovation. You'll work on building solutions to some of the world's toughest problems, surrounded by a diverse team of innovators committed to shaping the future. Join us and be part of crafting what comes next.
#J-18808-Ljbffr