Overview
Join to apply for the AI Research Engineer (Model Serving & Inference) role at Tether.io
3 weeks ago Be among the first 25 applicants
Get AI-powered advice on this job and more exclusive features.
About Tether
Join Tether and Shape the Future of Digital Finance. At Tether, we’re pioneering a global financial revolution. Our solutions empower businesses—from exchanges and wallets to payment processors and ATMs—to seamlessly integrate reserve-backed tokens across blockchains. By harnessing blockchain technology, Tether enables you to store, send, and receive digital tokens instantly, securely, and globally, at a fraction of the cost. Transparency is the bedrock of everything we do, ensuring trust in every transaction.
Innovate with Tether
Tether Finance: Our product suite features the world’s most trusted stablecoin, USDT, relied upon by hundreds of millions worldwide, alongside pioneering digital asset tokenization services.
But that’s just the beginning
Tether Power: Driving sustainable growth, our energy solutions optimize excess power for Bitcoin mining using eco-friendly practices in state-of-the-art, geo-diverse facilities.
Tether Data: Fueling breakthroughs in AI and peer-to-peer technology, we reduce infrastructure costs and enhance global communications with cutting-edge solutions like KEET, our flagship app that redefines secure and private data sharing.
Tether Education: Democratizing access to top-tier digital learning, enabling individuals to thrive in the digital and gig economies, driving global growth and opportunity.
Tether Evolution: At the intersection of technology and human potential, we push the boundaries of what is possible, crafting a future where innovation and human capabilities merge in powerful, unprecedented ways.
Why Join Us?
We are a global talent powerhouse, working remotely from around the world. If you’re passionate about making a mark in fintech, this is your opportunity to collaborate with bright minds, push boundaries, and set new standards. We’ve grown fast, stayed lean, and secured our place as a leader in the industry.
If you have excellent English communication skills and are ready to contribute to the most innovative platform on the planet, Tether is the place for you.
Are you ready to be part of the future?
About the job
As a member of our AI model team, you will drive innovation in model serving and inference architectures for advanced AI systems. Your work will focus on optimizing model deployment and inference strategies to deliver highly responsive, efficient, and scalable performance across real-world applications. You will work on a wide spectrum of systems, ranging from resource-efficient models designed for limited hardware environments to complex, multi-modal architectures that integrate data such as text, images, and audio.
We expect you to have deep expertise in designing and optimizing model serving pipelines and inference frameworks as well as a strong background in advanced model architectures. You will adopt a hands-on, research-driven approach to develop, test, and implement novel serving strategies and inference algorithms. Your responsibilities include engineering robust inference pipelines, establishing comprehensive performance metrics, and identifying and resolving bottlenecks in production environments. The ultimate goal is to enable high-throughput, low-latency, low-memory footprint, and scalable AI performance that delivers tangible value in dynamic, real-world scenarios.
Responsibilities
* Design and deploy state-of-the-art model serving architectures that deliver high throughput and low latency while optimizing memory usage. Ensure pipelines run efficiently across diverse environments, including resource-constrained devices and edge platforms.
* Establish clear performance targets such as reduced latency, improved token response, and minimized memory footprint.
* Build, run, and monitor controlled inference tests in both simulated and live production environments. Track KPIs such as response latency, throughput, memory consumption, and error rates, with attention to metrics specific to resource-constrained devices. Document iterative results and compare outcomes against established benchmarks.
* Identify and prepare high-quality test datasets and simulation scenarios tailored to real-world deployment challenges, especially on low-resource devices. Set measurable criteria to ensure resources effectively evaluate model performance, latency, and memory utilization under varied conditions.
* Analyze computational efficiency and diagnose bottlenecks in the serving pipeline by monitoring processing and memory metrics. Address issues such as suboptimal batch processing, network delays, and high memory usage to optimize the serving infrastructure for scalability and reliability on resource-constrained systems.
* Collaborate with cross-functional teams to integrate optimized serving and inference frameworks into production pipelines designed for edge and on-device applications. Define clear success metrics such as improved real-world performance, low error rates, robust scalability, and optimal memory usage, with ongoing monitoring and refinements.
* A degree in Computer Science or related field. Ideally a PhD in NLP, Machine Learning, or a related field, complemented by a solid track record in AI R&D (with publications in conferences).
* Proven experience in low-level kernel optimizations and inference optimization on mobile devices. Contributions should have led to measurable improvements in inference latency, throughput, and memory footprint for domain-specific applications on resource-constrained devices and edge platforms.
* A deep understanding of modern model serving architectures and inference optimization techniques, including state-of-the-art methods for low-latency, high-throughput performance and efficient memory management in diverse deployment scenarios.
* Strong expertise in writing CPU and GPU kernels for mobile devices (smartphones) and a deep understanding of model serving frameworks. Practical experience in developing and deploying end-to-end inference pipelines, optimizing models for efficient serving, and integrating these solutions on resource-constrained devices.
* Demonstrated ability to apply empirical research to overcome challenges in model serving, such as latency optimization, computational bottlenecks, and memory constraints. Proficient in designing robust evaluation frameworks and iterating on optimization strategies to push inference performance and system efficiency.
Seniority level: Not Applicable
Employment type: Full-time
Job function: Information Technology
Industry: Technology, Information and Internet
Referrals increase your chances of interviewing at Tether.io by 2x
Sign in to set job alerts for “Artificial Intelligence Engineer” roles.
Madrid, Community of Madrid, Spain
Related roles written below
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
J-18808-Ljbffr
#J-18808-Ljbffr