Data engineer

Toledo

Staq.io

Publicada el 27 febrero

Descripción

About UsStaq is a leading Banking-as-a-Service (BaaS) and embedded finance platform, transforming the way businesses integrate banking and financial services. At Staq, we empower our clients to innovate, expand, and streamline their financial services offerings using our cutting-edge platform. Our mission is to bridge the gap between traditional banking and the digital era by providing seamless, scalable, and secure financial solutions.

The RoleOur agents, recommendation systems, and automations are only as good as the data they consume. An agent giving financial advice needs rich, accurate, timely context about a user's accounts, transactions, spending patterns, and financial goals. A recommendation engine needs well-structured feature data. An automation trigger needs reliable signals.Right now that data plumbing doesn't have a dedicated owner. As we scale from one product to an SDK that multiple banking applications use, the data layer becomes a shared dependency that every AI feature builds on top of. This role owns the pipelines that feed the intelligence platform, the evaluation data that tells us if our AI is working, and the infrastructure that lets us iterate on data quality without slowing down AI development.

Key ResponsibilitiesContext & Feature Pipelines for AIBuild and maintain the data pipelines that transform raw financial data (Plaid transactions, bank accounts, credit data, subscription records) into the enriched context that agents consume at runtimeDesign the feature store or context layer that serves real-time and batch features to agents, recommendation engines, and automation triggersEnsure data freshness, quality, and consistency across all pipelines feeding the intelligence platformBuild the context enrichment that makes the difference between a generic chatbot and a financial assistant that actually understands a user's financial situationEvaluation & Observability DataBuild the data infrastructure for AI evaluation — collecting agent decisions, recommendation results, automation outcomes, and user feedback into queryable, analyzable datasetsOwn the LLM observability data layer — structured collection of call latencies, token usage, cost per flow, error rates, and model performance metrics across all agent and automation flowsCreate dashboards and data products that let the AI team measure agent quality, recommendation relevance, automation success rates, and LLM operational healthSupport A/B testing and experiment tracking data infrastructure so we can iterate on AI behavior with evidence, not intuitionSDK Data ContractsDesign data contracts and schemas that serve both Zeen and future banking applications that plug into the intelligence platform SDKOwn the ingestion layer for partner and third-party data sources — as the SDK expands to other banks, each will bring their own data formats and integration patternsBuild the feedback loops that connect production outcomes back to agent and recommendation improvementData Quality & OperationsOwn data quality monitoring, validation, and alerting across all pipelinesBuild data lineage tracking so we can trace any agent decision back to the data that informed itEnsure PII handling in data pipelines aligns with platform policy — financial data requires careful treatment, and the AI layer has strict boundaries around what data reaches LLMsTechnical EnvironmentPython for pipeline development; SQL for analytics and data modelingFinancial data sources: Plaid, partner APIs, internal domain services (banking, credit, subscriptions, journal/ledger)OpenTelemetry traces and structured artifacts as data sources for AI evaluationCloud-native infrastructure; containerized servicesFinancial data with strict handling requirements

What We Are Looking ForMust Have3+ years building and operating production data pipelinesStrong Python and SQL; experience with data transformation frameworksExperience designing schemas and data contracts for consumption by application services or ML/AI systemsUnderstanding of data quality practices — validation, monitoring, alerting on pipeline failuresComfort working with sensitive financial data and understanding why data handling discipline mattersStrong SignalsExperience building data infrastructure that feeds AI/ML systems (feature stores, context pipelines, evaluation datasets)Fintech or financial services backgroundFamiliarity with observability data (OpenTelemetry, structured logs) as a data sourceExperience building monitoring and analytics for LLM systems — latency tracking, cost attribution, and performance dashboardsExperience with data lineage, audit trails, or data governanceExposure to real-time streaming alongside batch processingExperience designing data contracts for multi-tenant or multi-product platforms

Enviar

Crear una alerta

Guardar