At Apple, new ideas have a way of becoming extraordinary products, services, and customer experiences very quickly. Apple is a place where extraordinary people come together to do their life's best work. Together, we build technologies and experiences people once couldn't have imagined — and now can't imagine living without
The AI/ML Software Engineering team in Madrid, Spain, is seeking an experienced developer to work on the evaluation of next-generation features for Apple Intelligence. This role is at the intersection of software engineering, machine learning, and human-centered design. You will be responsible for developing the processes, tools, and infrastructure that allow Apple to assess and optimize the intelligence, responsiveness, and quality of Siri and other Apple Intelligence features before they reach millions of users around the world.
Description
In this role, you will work with highly skilled engineers building scalable systems and frameworks for end-to-end (E2E) evaluation of Apple Intelligence products such as Siri. Your work will be critical in validating the performance and reliability of unreleased software and models, ensuring that Siri and Apple Intelligence continue to set the standard for intelligent voice assistants. The position requires a motivated, technically qualified Senior Engineer with the ability to coordinate work between multiple teams across different time zones and locations. Outstanding project management skills are required to drive technical tasks and report results clearly.
Minimum Qualifications
Proven experience in managing large projects effectively, with a focus on the evaluation of large-scale, AI-based solutions.
Experience crafting evaluation datasets and strategies for ML-based products.
Strong programming background in Python, Swift, or similar languages, with a focus on Machine Learning, test automation, or data tooling.
B.S. in Computer Science or related field required.
Preferred Qualifications
Proven ability to analyze and synthesize large-scale evaluation data and metrics, identify failure patterns, and provide actionable feedback for model and product improvement.
Experience in large language model usage and benchmarking.
Strong organizational and problem-solving skills on a large, cross-functional team.
M.S. or Ph.D. preferred.