Research Scientist Graduate (Multimodal Large Language Model) - 2026 Start (PHD)

Onsite$136,800 – $259,200/yrPosted Oct 31, 2025Website LinkedIn

Manual Apply

Skip the busywork

ApplyBolt rewrites your resume for this exact role and hits submit. You just pick the jobs.

Resume tailored to this roleApplied in secondsTrack every application

Download the app

About this role

PICO-MR team is dedicated to developing the core environment perception algorithms within the PICO system. Our research and development directions cover three major areas: visual localization, scene understanding, and 3D reconstruction. Within MR scenarios, our work spans: multimodal scene understanding (MLLMs), AIGC-based scene generation, depth estimation (Mono/Stereo/MVS), 3D environment perception, large-scale 3D scene reconstruction (3DGS, NeRF, etc.), visual localization, lighting estimation, ranging from fundamental research to real-world deployment.

We are looking for talented individuals to join our team in 2026. As a graduate, you will get opportunities to pursue bold ideas, tackle complex challenges, and unlock limitless growth. Launch your career where inspiration is infinite at ByteDance.

Successful candidates must be able to commit to an onboarding date by end of year 2026. Please state your availability and graduation date clearly in your resume.

Responsibilities

Responsible for the R&D of multimodal large models in MR scenarios based on modalities such as vision, point clouds, and text, including model optimization, data construction, evaluation enhancement, as well as training/inference acceleration.
Deeply explore and keep up with cutting-edge technologies in the industry, driving the application and implementation of multimodal large models in MR scenarios.

Qualifications

Minimum Qualifications

Final year Ph.D or recent Ph.D graduates in Computer Science, engineering or quantitative field
Strong expertise in multimodal large model pre-training and fine-tuning techniques, with hands-on experience in model optimization and related workflows.
Excellent problem-solving and learning ability, able to quickly grasp cutting-edge technologies and apply them to practical challenges.
Proficient in 2D/3D computer vision tasks, including detection, segmentation, depth estimation, and image matching.
Skilled in Python and C++, with solid programming and engineering implementation capabilities.

Preferred Qualifications

Publications in top-tier conferences in computer vision, robotics, or graphics, or notable achievements in competitions, are a plus.
Award winners of ACM-ICPC, NOI/IOI, TopCoder, or similar competitions are highly preferred.