Role Description and Key DeliverablesWe are seeking a passionate HPC engineer. The adecuado candidate will have extensive hands-on experience making an impact with HPC technology, delivering HPC services to a high quality, and able to relate to the scientific community and work closely with users to make the best use of research computing services.The HPC landscape is continually evolving. You will need the skills to help build and operate industry-leading capabilities, including application build frameworks, containerised applications and cloud software-as-a-service. Automated deployment is a key feature, and you will need to be comfortable with DevOps processes and delivering consistency through automation and infrastructure-as-code.Key ResponsibilitiesDesign, implement, and maintain robust platform infrastructure using Infrastructure as Code (IaC) tools such as Terraform, ensuring secure and scalable environments in our private cloud ecosystem.Develop, deliver and operate research computing services and applications.Take a Site Reliability Engineering approach to HPC services, managing the development deployment, monitoring and incident response end-to-end. Solve complex technical problems, both with SCP services and the user’s use of them.Essential Knowledge, Skills, and Experience10+ years of hands-on experience operating, crafting or engineering large-scale computing environments, such as HPC, HTC or BCDrive innovative computational solutions and exploit emerging technologiesExperience of administration of large-scale cluster and server computing and relatedSoftware (e.g. Slurm, LSF, Grid Engine)Hands-on experience working in a DevOps team and using agile methodologiesOperating and consuming virtualized private cloud resources (e.g. OpenStack)Understanding of Linux system administration, the TCP/IP stack, and storage subsystemsExperience in implementing and administering large-scale parallel filesystems (e.g. Weka, GPFS, Lustre)Proven experience of using configuration management (e.g. ansible, salt, puppet) and technology frameworks in IT operationsExperience of developing and managing relationships with 3rd party suppliersScripting and tool development for HPC & DevOps style platform operations using bash and PythonDesirable Skills and KnowledgeScientific degree, and/or experience in computationally intensive analysis of scientific dataPrevious experience in high performance computing (HPC) environments, especially at large scales (>10,000 cores) Operation and configuration of public cloud computing infrastructure (e.g. AWS, Azure, GCP) is a plusManaging a virtualized private cloud environment (e.g. OpenStack) is a plusContainer technology (e.g. LXD, Singularity, Docker, Kubernetes) is a plusDemonstrated development experience with a variety of programming languages, tools, and technologies (Java/C++, Python/Ruby/Perl, SQL, AWS) is a plusExperience with Hashicorp tools like terraform, vault, consul and nomad is a plusWorking experience with high-speed networks (e.g. InfiniBand)