Job Reference
606_25_LS_LT_RE1
Position
Data Engineer for Language Technologies (RE1)
Closing Date
Saturday, 18 October, 2025
Reference: 606_25_LS_LT_RE1
Job title: Data Engineer for Language Technologies (RE1)
About BSC
The Barcelona Supercomputing Center - Centro Nacional de Supercomputación (BSC-CNS) is the leading supercomputing center in Spain. It houses MareNostrum, one of the most powerful supercomputers in Europe, was a founding and hosting member of the former European HPC infrastructure PRACE (Partnership for Advanced Computing in Europe), and is now hosting entity for EuroHPC JU, the Joint Undertaking that leads large-scale investments and HPC provision in Europe. The mission of BSC is to research, develop and manage information technologies in order to facilitate scientific progress. BSC combines HPC service provision and R&D into both computer and computational science (life, earth and engineering sciences) under one roof, and currently has over 1000 staff from 60 countries.
Look at the BSC experience:
BSC-CNS YouTube Channel
Let's stay connected with BSC Folks
We are particularly interested for this role in the strengths and lived experiences of women and underrepresented groups to help us avoid perpetuating biases and oversights in science and IT research. In instances of equal merit, the incorporation of the under-represented sex will be favoured.
We promote Equity, Diversity and Inclusion, fostering an environment where each and every one of us is appreciated for who we are, regardless of our differences.
If you consider that you do not meet all the requirements, we encourage you to continue applying for the job offer. We value diversity of experiences and skills, and you could bring unique perspectives to our team.
Context And Mission
The Language Technologies Laboratory at BSC has consolidated experience in several NLP areas, such as massive language model building, biomedical text mining, machine translation and unsupervised learning for under-resourced languages and domains. It has been entrusted by the Spanish and the Catalan governments with the mission to develop fundamental open- source resources and technologies for Spanish and Catalan. In connection with this, the LT Lab is currently in charge of two flagship projects at the national and regional level: the ALIA project, funded by the Spanish Secretariat of Digitalisation and Artificial Intelligence, and the AINA project, aimed at developing AI resources for Catalan, funded by the Catalan Digitalisation Department. In addition, the Lab participates in various EU funded international projects.
The LT Lab is looking for candidates with a background in computational linguistics with experience in Language Technologies, specifically in Deep Learning and large language model building, and possibly in other areas of Natural Language and Speech Processing.
The successful candidate will work in a highly sophisticated HPC environment, have access to state-of-the-art systems and computational infrastructures, and establish collaborations with experts in different areas at the local and international levels.
The researcher will implement innovative techniques for language modeling and evaluation in the HPC environment.
Este contrato se encuentra financiado por el proyecto "Despliegue de la familia de Modelos ALIA en castellano y lenguas cooficiales", con referencia externa 2024EtL00019, promovido por la Secretaría de Estado de Digitalización e Inteligencia Artificial (SEDIA), cuyos fondos provienen del Ministerio para la Transformación Digital y de la Función Pública, financiado por la Unión Europea-NextGenerationEU».
Key Duties
* Work, in collaboration with the group members, on the design and development of the solutions needed to achieve the goals of the group's research projects.
* Interact with relevant stakeholders of the group's research projects to understand their problems and the available data to formulate valuable solutions.
* Ensure the long-term acquisition, management and accessibility of language data through the design and implementation of scalable storage solutions and structured data systems, and processing tools.
* Collaborate with the members of the group in the generation and evaluation of language models using Deep Learning techniques (Transformers, Recurrent Neural Networks, and other neural network architectures).
Requirements
1. Education
2. Degree in Applied Linguistics, Computer Science or related disciplines with a very strong linguistic background.
3. Essential Knowledge and Professional Experience
4. Native speaker of Spanish.
5. Good knowledge of Python.
6. Good knowledge of Linux.
7. Knowledge of Deep Learning.
8. Experience in Machine Learning techniques applied to NLP.
9. Experience/ knowledge in corpus annotation and generation of linguistic resources.
10. Understanding of data administration and management functions (