Principal Vector Data Engineer – Johnson & Johnson Innovative Medicine
Primary location: Barcelona, Spain. Secondary location: Madrid, Spain. Hybrid role.
Position Summary
The Vector Data Engineer sits at the intersection of machine learning, signal processing, and neuroscience, focusing on neurodegenerative and neuropsychiatric disorders. The successful candidate will develop foundational models for longitudinal data integration and disease progression prediction, helping uncover biomarkers for conditions such as Alzheimer’s disease, Parkinson’s disease, and Major Depressive Disorder.
Key Responsibilities
* Design and implement vector embedding models to unify diverse biomedical data modalities.
* Handle modalities such as EEG, PPG, accelerometry, biosensor signals, speech and physiological recordings, and medical imaging (MRI, PET).
* Develop quality control protocols for mobile-captured images and transform pixel‐based images into vector representations.
* Adapt and extend custom embedding methods from academic research to build scalable foundation models.
* Collaborate with cross‑functional teams to integrate multimodal data for machine learning and clinical insight generation.
* Contribute to the identification of digital biomarkers and predictive patterns in neurological and psychiatric conditions.
Qualifications
* MS/PhD in Computer Science, Electrical Engineering, Biomedical Sciences, or a related field.
* Minimum 3 years of experience in multimodal data modeling, machine learning, or biomedical signal processing.
* Familiarity with large pre‑trained multimodal models (e.g., CLIP, MedCLIP, Flamingo) and biomedical/time‑series adaptations (BioBERT, TimeSformer).
* Strong proficiency in Python and experience with PyTorch, TensorFlow, Hugging Face Transformers, and scikit‑learn.
* Experience with Librosa, SpeechBrain, or similar libraries for acoustic and temporal feature extraction.
* Proficiency in NiBabel for MRI and PET imaging, and familiarity with FSL, FreeSurfer, and NiLearn for neuroimaging analysis.
* Experience with CLIP‑like architectures and contrastive/self‑supervised learning for multimodal integration.
* Understanding of clinical trial data and longitudinal monitoring frameworks, and experience with large‑scale dataset curation and embedding evaluation.
Strategic Impact
* Transform digital health and clinical multimodal data assets into interoperable, vectorized embeddings for AI applications.
* Enable semantic queries and reasoning over governed datasets.
* Build a scalable vector database infrastructure that complies with governance and lineage standards.
* Accelerate insight generation across discovery, translational, and clinical domains.
Senior‑level: Not applicable. Employment type: Full‑time. Job function: Information Technology. Industry: Pharmaceutical Manufacturing.
#J-18808-Ljbffr