Site reliability engineer

Murcia (30001)

Tinybird

De 70.000 € a 90.000 € al año

Publicada el Publicado hace 18 hr horas

Descripción

Overview

¿Listo para inscribirse? Antes de hacerlo, asegúrese de leer todos los detalles pertenecientes a este trabajo en la descripción a continuación.

At Tinybird, we help developers and data teams take flight by unlocking the power of real-time data to quickly build data pipelines and innovative data products. With Tinybird, you can ingest multiple data sources at scale, query and shape it using SQL, and publish results as low-latency, high-concurrency APIs for applications. Developers can create fast APIs quickly, enabling innovation and efficiency.
About Tinybird: At Tinybird, we help developers and data teams take flight by unlocking the power of real-time data to quickly build data pipelines and innovative data products. With Tinybird, you can ingest multiple data sources at scale, query and shape it using SQL, and publish results as low-latency, high-concurrency APIs for applications.
What you will be doing

We are looking for someone to help us scale and to keep our software and infrastructure reliable and elastic as we scale. You will participate as part of the on-call team, to understand not only our product, but also the issues our clients face.
We run our stack in Linux. Technologies we use:
OpenResty: SSL termination and load balancing
Varnish: load balancing and caching
Redis: metadata store
Python: most backend uses Python with some C++ for hot paths
ClickHouse: main data store
Zookeeper: replication coordination for ClickHouse
Grafana, Loki and Mimir for monitoring and alerting
Terraform: cloud provisioning (VMs, networks, Kubernetes clusters)
Ansible: deploys software and configuration
Kubernetes: base of infrastructure with autoscaling
We operate a large-scale distributed system focusing on efficiency, building a self-service platform that adapts to workload changes and autoscales itself
You’ll work with product and backend teams to design system architecture, optimize resource usage, and improve elasticity and autonomy
You’ll need to understand how ClickHouse works to extract the best performance
Some challenges and things we want to improve:
High availability and elasticity: the platform should scale automatically and efficiently without manual intervention, making capacity decisions transparently and safely
Observability: good understanding of storage, networking, and compute; monitoring of resources and service metrics
Disaster recovery: better tooling, incident discovery, and on-call experience
What you bring

Experience designing, building and running distributed cloud architectures and large-scale web applications
Programming skills and willingness to dive into our codebase, including ClickHouse source code; we work mainly with Python and C++
Accountable and enthusiastic about owning and managing the platform, proactive about fixing issues
Bias for action, iteration, and delivery; comfortable with quick reversals when needed
Systems thinking with attention to edge cases, failure modes, behaviors, and implementations
Comfortable collaborating asynchronously; expects direct daily team communication
Build software with empathy, intuitive and maintainable; document key insights and solutions for easy understanding
Experience with OpenResty, Varnish, Redis, Terraform or Ansible is helpful, but we expect you to recommend the right technology for each challenge
Experience with ClickHouse or rolling out databases at scale is a plus
Deep expertise in Kubernetes: designing and operating production-grade clusters, writing custom controllers, and tuning autoscaling (KEDA, Karpenter, etc.). Understand networking, storage, scheduling, and resource management, and reason about performance and failures at scale
Proficiency with AWS and GCP cloud providers
How We Work

We’re a fully remote company, committed to a remote-first culture. xpzdshu
We have offices in Madrid and New York City; you can visit as it suits you.
As we’re in the early stages, your contributions will have a significant impact on everything we do.
We believe in transparency, so you’ll always be in the loop about what’s happening.
Check out our blog or follow us on LinkedIn to learn more about what’s important to us.

#J-18808-Ljbffr

Enviar

Crear una alerta

Guardar