NIO
San Jose, US
LLM Algorithmic Optimization Engineer - Intern
Skip the busywork
ApplyBolt rewrites your resume for this exact role and hits submit. You just pick the jobs.
Resume tailored to this roleApplied in secondsTrack every application
Download the appAbout this role
Roles and Responsibilities:
- Conduct research and apply cutting-edge technologies to optimize Large Language Models (LLMs) and multimodal models, exploration and implementation of the core algorithmic optimization on heterogeneous architectures, for highly efficient LLM inference as well as deployment across distributed and heterogeneous hardware environments.
- Focus on model optimization from a systems perspective, ensuring efficient deployment in the vehicle’s digital cockpit and advanced driving (AD) domain.
- Collaborate with cross-functional teams to ensure the integration of optimized models into real-world automotive applications.
- Contribute to the entire pipeline from research, development, and testing, through to deployment on hardware, including GPUs and other distributed systems.
Qualifications:
- Currently pursuing or completed a PhD or Master’s degree in Computer Science, Computer Engineering, Applied Mathematics, Communications, Electronics, or a related field with relevant research projects and publications.
- Strong understanding of GPU/NPU architecture and optimization techniques to identify and address bottlenecks.
- Proficient in LLM and VLM architectures and algorithms, familiar with transformer based NLP / Audio / CV algorithms and technologies.
- Proficiency in Python and experience with AI-related training and inference tools such as PyTorch.
- Proficiency in C/C++ programming, familiar with at least one commonly used LLM inference engines.
- Hands-on experience with model-serving frameworks such as Open Neural Network Exchange (ONNX).
- Familiarity with debugging code in distributed computing environments.Experience in LLM inference optimization on resource constrained edge devices is a plus.
Preferred Qualification:
- Ph.D. in computer science, artificial intelligence, or related fields; or Masters degree + 3 years of relevant industry experience
- Experience in inference optimization techniques of deep learning models or libraries on hardware architectures;
- Familiar with microkernel architecture, Linux kernel, hypervisor, middleware, and application framework
- Those who have good publication records and have published high impact, innovative papers are preferred