Devops / platform engineer

Tarragona

INSUS - AI Solutions for Sustainable Transformation

Publicada el Publicado hace 5 hr horas

Descripción

PDevOps / Platform Engineer /ppPlatform Engineer Level II — SWENG /ppEurope · Full-Time · GCP Primary / AWS Secondary /ppThe Mission /ppWe are not looking for someone to “run scripts.” We are looking for a Platform Architect who understands that in our environment, a single configuration change propagates across 98 services and 55 products. Every decision has blast radius. Every action must be preceded by assessment. /ppYou will join a team of three Platform Engineers responsible for a massive, shared global estate across GCP and AWS. This role is about building the “paved road” for our software engineers — designing scalable, secure, and automated environments where safety is built-in, not bolted on. /ppThe Operating Reality /ppSWENG operates a shared platform delivered by 5 engineers. Every engineer operates with full autonomy and full accountability from day one. There is no onboarding ramp that absorbs mistakes at this scale. /ppScale /pp98 services · 22 environments · 55 products · 80+ edge locations across GCP and AWS Team /pp5 engineers total (3 Platform Engineers). No supervisory capacity. No error correction buffer. Autonomy /ppYou assess context, analyze failure modes, and communicate structured decisions before touching the keyboard. Philosophy /ppProcess discipline is what allows us to move fast. You are a process-oriented engineer who treats infrastructure as a product. /ppCore Responsibilities /ppArchitectural Ownership /ppDesign and implement highly available, secure infrastructure on GCP (primary) and AWS. You are not just building it — you are ensuring it is cost-effective, scalable, and relevant to 55 products simultaneously. /ppInfrastructure as Code (IaC) /ppTreat the entire estate as software using Terraform. Manage complex state files and ensure modularity across all 22 environments. Every infrastructure change is code-reviewed, not clicked. /ppGuardrail Engineering /ppBuild and maintain CI/CD pipelines (GitHub Actions / Jenkins) that do not just deploy code — they enforce security and governance automatically. The pipeline is the last line of defence before 98 services are affected. /ppSystems Thinking Advisory /ppAct as a consultant to the Software Engineering team. Challenge decisions that are not scalable. Communicate tradeoffs using a structured Impact → Options → Recommendation framework. A well-reasoned advisory is as valuable as the implementation. /ppObservability /ppBuild the Prometheus / Grafana / Stackdriver telemetry that predicts outages — not just reacts to them. Instrument proactively; alert meaningfully. /ppMLOps Scaling /ppSupport the scaling of machine learning products (Kubeflow Pipelines) to meet global demand across all environments. /ppWho You Are — Requirements /ppExperience /pp 5+ years in DevOps / SRE with a proven track record in Platform Engineering — managing shared infrastructure for multiple teams simultaneously. /ppGCP Mastery /pp Deep, production-level experience with Google Cloud Platform and Kubernetes (GKE). /pp You have operated GCP at scale — not just provisioned resources. /ppThe “Architect” Mindset /ppThis is the most critical requirement. You must demonstrate: /pp Structured communication: Problem → Impact → Options → Recommendation, without supervision. /pp Blast radius awareness: You do not say “it might break.” You explain how it breaks, what is affected, and what the recovery path is. /pp Context-first approach: Before any action, you assess what exists, what is affected, who needs to know, and what the downstream consequences are across the shared estate. /pp Failure mode thinking: You anticipate failure scenarios and design for graceful degradation, not just happy-path operation. /ppGovernance-First /pp You understand that in a global environment with 55 products, following procedural processes is not overhead — it is a survival requirement. /pp You operate within change management frameworks and onboard others into them effectively. /pp You distinguish between urgency and risk — a CVSS 9.8 vulnerability requires contextual assessment (exposure, exploitability, blast radius), not a reflexive “drop everything.” /ppAutomation Obsessed /pp Expert-level Python scripting and a delete-manual-tasks mentality. /pp You automate detection, not just remediation. Manual checking is a process gap, not a strategy. /ppAccountability Orientation /pp You take ownership of outcomes, not just task completion. /pp You surface risks proactively to your team lead, with structured status: what is done, what the risk is, what you need. /pp You do not patch silently. You communicate clearly before, during, and after changes that affect shared infrastructure. /ppLocation /pp Based in Europe for time zone alignment with the team. /ppNice to Have /pp Hands-on experience with MLOps and Data Science tooling (Kubeflow, Vertex AI). /pp Deep knowledge of AWS (EC2, S3, RDS, Lambda) to manage our secondary environment. /pp Advanced Log Management (ELK / Splunk). /ppbr/ppCompensation : 60.000 - 70.000 EUR (B2B Contract) /ppLanguages : Fluent English /ppbr/ppPLEASE DON'T APPLY IF YOU ARE USING AI DURING JOB INREVIEW OR YOU ARE NOT A REAL PERSON. /p

Enviar

Crear una alerta

Guardar