AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to Work awards.
WHY JOIN US
If you're looking for a place to grow, make an impact, and work with people who care, we'd love to meet you!
ABOUT THE ROLE
We are looking for a Middle SRE Operations Engineer to maintain reliability across a cloud-based SaaS platform. You’ll handle live incidents, improve observability, and reduce toil through automation using Kubernetes, Terraform, Grafana, and AWS. Hands‑on, execution‑focused, with real ownership across CI/CD pipelines, GitOps workflows, and on‑call rotations.
WHAT YOU WILL DO
* Monitor and support production and staging environments to ensure availability, performance, and stability.
* Respond to incidents, perform triage and root cause analysis, and contribute to remediation efforts.
* Participate in on-call rotations with defined SLAs.
* Handle operational requests from internal teams.
* Maintain and improve monitoring, alerting, dashboards, logs, and metrics.
* Support CI/CD pipelines, production releases, and GitOps workflows.
* Contribute to automation initiatives to reduce operational overhead.
* Maintain and improve Kubernetes‑based infrastructure and containerized workloads.
* Support Infrastructure as Code practices and environment improvements.
MUST HAVES
* 2+ years of experience in Site Reliability Engineering, DevOps, or Production Operations.
* Experience with AWS supporting production environments.
* Experience supporting production SaaS applications.
* Strong understanding of CI/CD systems (GitHub Actions, Jenkins, CircleCI).
* Experience with GitOps and Git fundamentals.
* Experience using GitHub, Jira, and Confluence.
* Experience with Kubernetes (EKS, kOps or similar).
* Experience with Docker and containerization.
* Experience with observability tools (Grafana, Prometheus, Loki, PagerDuty).
* Proficiency in scripting (Bash, Python, or Go).
* Experience with Infrastructure as Code (Terraform, Helm).
* Ability to work within structured operational processes and SLAs.
* Strong written and verbal English communication skills.
* Self‑driven with a growth mindset.
NICE TO HAVES
* AWS certifications such as Solutions Architect, DevOps Engineer, or SysOps Administrator.
* Experience with multi‑tenant SaaS environments.
* Experience working in globally distributed teams.
* Familiarity with ChatOps practices.
* Experience improving monitoring quality and reducing alert fatigue.
PERKS AND BENEFITS
* Professional growth: Mentorship, TechTalks, and personalized growth roadmaps.
* Competitive compensation: USD‑based pay with education, fitness, and team activity budgets.
* Exciting projects: Modern solutions with Fortune 500 and top product companies.
* Flextime: Flexible schedule with remote and office options.
#J-18808-Ljbffr