Site reliability engineer

Madrid

Tinybird

Publicada el Publicado hace 3 hr horas

Descripción

Ph3Overview /h3pAt Tinybird, we help developers and data teams take flight by unlocking the power of real-time data to quickly build data pipelines and innovative data products. With Tinybird, you can ingest multiple data sources at scale, query and shape it using SQL, and publish results as low-latency, high-concurrency APIs for applications. Developers can create fast APIs quickly, enabling innovation and efficiency. /ppAbout Tinybird: At Tinybird, we help developers and data teams take flight by unlocking the power of real-time data to quickly build data pipelines and innovative data products. With Tinybird, you can ingest multiple data sources at scale, query and shape it using SQL, and publish results as low-latency, high-concurrency APIs for applications. /ph3What you will be doing /h3pWe are looking for someone to help us scale and to keep our software and infrastructure reliable and elastic as we scale. You will participate as part of the on-call team, to understand not only our product, but also the issues our clients face. /pulliWe run our stack in Linux. Technologies we use: /liliOpenResty: SSL termination and load balancing /liliVarnish: load balancing and caching /liliRedis: metadata store /liliPython: most backend uses Python with some C++ for hot paths /liliClickHouse: main data store /liliZookeeper: replication coordination for ClickHouse /liliGrafana, Loki and Mimir for monitoring and alerting /liliTerraform: cloud provisioning (VMs, networks, Kubernetes clusters) /liliAnsible: deploys software and configuration /liliKubernetes: base of infrastructure with autoscaling /liliWe operate a large-scale distributed system focusing on efficiency, building a self-service platform that adapts to workload changes and autoscales itself /liliYou’ll work with product and backend teams to design system architecture, optimize resource usage, and improve elasticity and autonomy /liliYou’ll need to understand how ClickHouse works to extract the best performance /li /ululliSome challenges and things we want to improve: /li /ululliHigh availability and elasticity: the platform should scale automatically and efficiently without manual intervention, making capacity decisions transparently and safely /liliObservability: good understanding of storage, networking, and compute; monitoring of resources and service metrics /liliDisaster recovery: better tooling, incident discovery, and on-call experience /li /ulh3What you bring /h3ulliExperience designing, building and running distributed cloud architectures and large-scale web applications /liliProgramming skills and willingness to dive into our codebase, including ClickHouse source code; we work mainly with Python and C++ /liliAccountable and enthusiastic about owning and managing the platform, proactive about fixing issues /liliBias for action, iteration, and delivery; comfortable with quick reversals when needed /liliSystems thinking with attention to edge cases, failure modes, behaviors, and implementations /liliComfortable collaborating asynchronously; expects direct daily team communication /liliBuild software with empathy, intuitive and maintainable; document key insights and solutions for easy understanding /liliExperience with OpenResty, Varnish, Redis, Terraform or Ansible is helpful, but we expect you to recommend the right technology for each challenge /liliExperience with ClickHouse or rolling out databases at scale is a plus /liliDeep expertise in Kubernetes: designing and operating production-grade clusters, writing custom controllers, and tuning autoscaling (KEDA, Karpenter, etc.). Understand networking, storage, scheduling, and resource management, and reason about performance and failures at scale /liliProficiency with AWS and GCP cloud providers /li /ulh3How We Work /h3pWe’re a fully remote company, committed to a remote-first culture. /ppWe have offices in Madrid and New York City; you can visit as it suits you. /ppAs we’re in the early stages, your contributions will have a significant impact on everything we do. /ppWe believe in transparency, so you’ll always be in the loop about what’s happening. /ppCheck out our blog or follow us on LinkedIn to learn more about what’s important to us. /p /p #J-18808-Ljbffr

Enviar

Crear una alerta

Guardar