**Job Function**:
Data Analytics & Computational Sciences
**Job Sub Function**:
Data Science
**Job Category**:
Scientific/Technology
**All Job Posting Locations**:
Cornellà de Llobregat, Barcelona, Spain, Madrid, Spain
Johnson and Johnson Innovative Medicine (J&J; IM), a pharmaceutical company of Johnson & Johnson is recruiting for a Vector Data Engineer. This position has a primary location of Barcelona, Spain. The secondary location is Madrid. This is a hybrid role.
Our expertise in Innovative Medicine is informed and inspired by patients, whose insights fuel our science-based advancements. Visionaries like you work in teams that save lives by developing the medicines of tomorrow.
**Position Summary**:
The Vector Data Engineer designs and implements the embedding and semantic-search infrastructure that connects discovery, translational, and clinical data into AI-ready knowledge representations.
This role bridges multi-omics data engineering and machine-learning infrastructure, enabling scientists and agentic tools to discover biological insights through vector-based search and reasoning.
**Key Responsibilities**:
- Develop scalable pipelines that convert multi-omics and clinical data (e.g., proteomics, transcriptomics, spatial omics, biomarkers) into vectorized embeddings for AI and semantic retrieval.
- Build and maintain vector databases and hybrid data stores using technologies such as TileDB, Weaviate, or Snowflake Cortex.
- Collaborate with the Data Transformation Engineers to design standardized data formats suitable for embedding generation and cross-modality mapping.
- Integrate metadata, ontology terms, and provenance into vector representations to ensure traceability and governance compliance.
- Partner with AI/ML Team to deploy embeddings supporting agentic reasoning, semantic similarity, and cross-dataset query.
- Optimize indexing, retrieval, and inference performance across large-scale multi-omics data collections.
- Evaluate and incorporate emerging representation-learning and knowledge-graph techniques to improve data discoverability and model interoperability.
**Qualifications**:
- MS/PhD in Computer Science, Computational Biology, Data Science, or related field.
- 3+ years of experience building or maintaining vector or semantic-retrieval infrastructure.
- Hands-on experience with multi-omics or biomedical data integration (e.g., RNA-seq, proteomics, clinical endpoints).
- Proficiency in Python and frameworks such as LangChain, Transformers, or sentence-embedding models.
- Familiarity with TileDB, Snowflake, Weaviate, FAISS, or other vector/array database systems.
- Understanding of metadata modeling, ontologies (e.g., OBO, UMLS), and FAIR data practices.
- Strong ability to collaborate across solution architecture, data science, and AI/ML teams.
**Strategic Impact**:
- AI can perform semantic queries and reasoning over governed datasets.
- Vector database infrastructure scales efficiently and complies with governance and lineage standards.
- Accelerated insight generation across discovery, translational, and clinical domains.
**#JRDDS