Platform engineer

Almería

KBC Technologies Group

Publicada el Publicado hace 17 hr horas

Descripción

● Design, deploy, and maintain Kubeflow (or equivalent) for pipeline orchestration, model training, evaluation, and serving on large image datasets; ensure reliability, security, and cost efficiency.

● Manage and tune Kubernetes clusters (EKS/GKE/AKS), set up namespaces, RBAC, autoscaling, network policies, and service meshes where appropriate; keep upgrades and operations predictable.

● Define infrastructure-as-code with Terraform; implement repeatable environment provisioning, configuration management, and golden paths for teams.

● Establish CI/CD workflows (GitHub Actions/Jenkins/GitLab CI), build/test standards, and progressive delivery patterns that keep releases fast and low-risk.

● Implement logging, metrics, and tracing (e.g., Prometheus, Grafana, CloudWatch, Splunk/New Relic) with actionable SLOs, alerts, and runbooks; embed security and compliance by design.

● Collaborate closely with product and science teams to remove bottlenecks, eliminate manual steps, and evolve service and data interfaces that make operating image pipelines simple and reliable.

● Contribute to future-state architectures that improve scalability, resiliency, and operational efficiency; lead targeted refactors and platform improvements.

● Manage core automation and tooling, and educate teams on platform capabilities, CI/CD, configuration management, and infrastructure automation best practices.

Required (Must-have):

● M.Sc. in Computer Science/Engineering (or equivalent) or comparable industry experience.

● Practical, production experience operating Kubeflow Pipelines for reproducible ML workflows at scale.

● Proven experience deploying and operating workloads on Kubernetes (EKS/GKE/AKS), including upgrades, autoscaling, RBAC, networking, and reliability; strong Unix/Linux fundamentals.

● Hands-on experience with AWS services (EKS, EC2, S3, IAM, CloudWatch; RDS a plus) and the ability to design secure, cost-aware architectures.

● Strong Terraform skills and Git-based workflows for repeatable infrastructure provisioning and configuration management.

● Practical experience with CI/CD platforms (GitHub Actions/Jenkins/GitLab CI), including artifact management, environment promotion, and progressive delivery. ● Solid Python and/or shell scripting for platform automation and toil reduction.

● Experience implementing logging, metrics, and tracing with SLOs, alerts, and runbooks (e.g., Prometheus, Grafana, CloudWatch, Splunk/New Relic) and a security-first mindset.

● Ability to lead technical initiatives, communicate trade-offs clearly, and collaborate effectively with engineering and science teams

Desirabel (Nice to have):

● Experience with MLflow, Feast, Argo, Airflow, Ray, and model versioning/monitoring.

● Familiarity with S3/object storage, artifact registries, and handling large image datasets; basic SQL/NoSQL exposure.

● Experience with digital pathology or large-scale image processing (e.g., whole-slide images) and tools like OpenSlide, scikit-image, or OpenCV.

● Experience tuning high-throughput pipelines, concurrency, memory usage, and integrating GPUs/accelerators.

● Experience with VPC design, ingress/egress, service meshes, secrets management, IAM, and policy as code.

● Experience in regulated environments (e.g., GxP), including data governance, privacy, and building software under regulated processes.

● Experience with Jira/Zendesk and with JavaScript/TypeScript for internal tools or dashboards.

Enviar

Crear una alerta

Guardar