ByteDance
San Jose, CA, USA

Research Scientist Graduate (Applied Machine Learning - ML System) - 2026 Start (PhD)

$136,800 – $259,200/yrPosted Oct 31, 2025WebsiteLinkedIn

Skip the busywork

ApplyBolt rewrites your resume for this exact role and hits submit. You just pick the jobs.

Resume tailored to this roleApplied in secondsTrack every application
Download the app

About this role

Responsibilities

AML-MLsys combines system engineering and the art of machine learning to develop and maintain massively distributed ML training and Inference system/services around the world, providing high-performance, highly reliable, scalable systems for LLM/AIGC/AGI.

In our team, you'll have the opportunity to build the large scale heterogeneous system integrating with GPU/NPU/RDMA/Storage and keep it running stable and reliable, enrich your expertise in coding, performance analysis and distributed system, and be involved in the decision-making process. You'll also be part of a global team with members from the United States, China and Singapore working collaboratively towards unified project direction.

We are looking for talented individuals to join our team in 2026. As a graduate, you will get opportunities to pursue bold ideas, tackle complex challenges, and unlock limitless growth. Launch your career where inspiration is infinite at ByteDance.

Successful candidates must be able to commit to an onboarding date by end of year 2026. Please state your availability and graduation date clearly in your resume.

  • Responsible for developing and optimizing LLM training&inference&RL framework.
  • Working closely with model researchers to scale LLM training&RL to the next level.
  • Responsible for GPU and CUDA Performance optimization to create an industry-leading high-performance LLM training and inference and RL engine.

Qualifications

Minimum Qualifications:

  • Bachelor's degree or above, major in computer/electronics/automation/software, etc.
  • Proficient in algorithms and data structures, familiar with Python.
  • Understand the basic principles of deep learning algorithms, be familiar with the basic architecture of neural networks and understand deep learning training frameworks such as Pytorch.

Preferred Qualifications:

  • Proficient in GPU high-performance computing optimization technology on CUDA, in-depth understanding of computer architecture, familiar with parallel computing optimization, memory access optimization, low-bit computing, etc.
  • Familiar with FSDP, Deepspeed, JAX SPMD, Megatron-LM, Verl, TensorRT-LLM, ORCA, VLLM, SGLang, etc.
  • Knowledge of LLM models, experience in accelerating LLM model optimization is preferred.