NVIDIA
US, CA, Santa Clara

High-Performance LLM Training Engineer - New College Grad 2026

$124,000 – $195,500/yrPosted 2 weeks agoWebsiteLinkedIn

Skip the busywork

ApplyBolt rewrites your resume for this exact role and hits submit. You just pick the jobs.

Resume tailored to this roleApplied in secondsTrack every application
Download the app

About this role

NVIDIA is seeking experienced engineers specializing in performance analysis and optimization to improve the efficiency of LLM training workloads, which are shaping the world's most advanced computing systems. This position focuses on optimizing NVIDIA’s high-performance LLM software stack in frameworks like PyTorch and JAX for high-performance training on thousands of GPUs, while also helping shape hardware roadmaps for the next generation of GPUs powering the AI revolution.

What you will be doing:

  • Understand, analyze, profile, and optimize AI training workloads on innovative hardware and software platforms.
  • Understand the big picture of training performance on GPUs, prioritizing and then solving problems across all state-of-the-art neural networks.
  • Implement production-quality software in multiple layers of NVIDIA's deep learning platform stack, from drivers to DL frameworks.
  • Build and support NVIDIA submissions to the MLPerf Training benchmark suite.
  • Implement key DL training workloads in NVIDIA's proprietary processor and system simulators to enable future architecture studies.
  • Build tools to automate workload analysis, workload optimization, and other critical workflows.

What we want to see:

  • MS in Computer Science, Electrical Engineering or Computer Engineering (or equivalent experience).
  • Strong background in deep learning and neural networks, in particular training.
  • A deep background in computer architecture and familiarity with the fundamentals of GPU architecture.
  • Proven experience analyzing and tuning application performance & processor and system-level performance modeling.
  • Programming skills in C++, Python, and CUDA.

GPU computing is the most productive and pervasive platform for deep learning and AI. It begins with the most advanced GPUs and the systems and software we build on top of them. We integrate and optimize every deep learning framework. We work with the major systems companies and every major cloud service provider to make GPUs available in data centers and in the cloud. We craft computers and software to bring AI to edge devices, such as self-driving cars and autonomous robots. AI has the potential to spur a wave of social progress unmatched since the industrial revolution.