Technical Program Manager III, AI/ML Data Analysis, Search

Onsite$163,000 - $237,000/yrPosted 3 days ago

Auto-apply with the iPhone app

Job details

Location: Mountain View, California
Work type: Onsite
Compensation: $163,000 - $237,000/yr
Posted: 3 days ago
Apply on: careers.google.com

About this role

Minimum qualifications:

Bachelor's degree in a technical field, or equivalent practical experience.
5 years of experience in program management.
Experience evaluating Large Language Models (LLMs), working on Natural Language Processing (NLP) data pipelines, or designing data for Reinforcement Learning from Human Feedback (RLHF).
Experience with construct operationalization, design, or behavioral coding manual development.

Preferred qualifications:

PhD degree in Quantitative Psychology, Psychometrics, Educational Measurement, Behavioral Data Science, or a related discipline.
5 years of experience managing cross-functional or cross-team projects.
Experience calculating and interpreting Inter-Rater Reliability (IRR) metrics (e.g., Cohen’s/Fleiss’ Kappa, ICC) and conducting statistical variance analysis on human-generated data.
Experience in Item Response Theory (IRT), Bayesian modeling, or hierarchical/multilevel modeling, specifically applied to rater behavior or differential item functioning.
Experience in computational linguistics, semantics, or designing evaluations for highly nuanced language tasks.
Experience using statistical programming languages (R or Python) to analyze large, complex datasets.

About the job

A problem isn’t truly solved until it’s solved for all. That’s why Googlers build products that help create opportunities for everyone, whether down the street or across the globe. As a Technical Program Manager at Google, you’ll use your technical expertise to lead complex, multi-disciplinary projects from start to finish. You’ll work with stakeholders to plan requirements, identify risks, manage project schedules, and communicate clearly with cross-functional partners across the company. You're equally comfortable explaining your team's analyses and recommendations to executives as you are discussing the technical tradeoffs in product development with engineers.

As the TPM of Human Measurement and Validation, you will be the chief architect of the human evaluation systems that power our Reinforcement Learning (RL) models in Search. Approaching AI evaluation as a massive-scale psychometric challenge, you will translate complex, latent constructs, such as model helpfulness, safety, and reasoning, into highly reliable, standardized behavioral assessments. Bridging the gap between ML research and the science of human measurement, you will design coding manuals, scale automated quality assurance pipelines, and lead statistical calibration. Your work will ensure our global RLHF data maximizes construct validity, minimizes measurement error, and delivers the clinical-grade accuracy required to train safe, capable AI.

In Google Search, we're reimagining what it means to search for information – any way and anywhere. To do that, we need to solve complex engineering challenges and expand our infrastructure, while maintaining a universally accessible and useful experience that people around the world rely on. In joining the Search team, you'll have an opportunity to make an impact on billions of people globally.

The US base salary range for this full-time position is $163,000-$237,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google.

Responsibilities

Deconstruct complex semantic and behavioral models outputs into observable, quantifiable rating criteria, functioning much like diagnostic behavioral anchors.
Establish baseline Inter-Rater Reliability (IRR) metrics (e.g., ICC, Cohen’s/Fleiss’ Kappa) and architect programmatic pipelines to monitor the longitudinal psychometric health of data collections.
Partner with ML Engineering to integrate human assessment workflows directly into the model development lifecycle and deploy automated computerized evaluation tooling.
Design automated data quality checks to detect careless responding, systematic rater bias, straight-lining, and other threats to data integrity at scale.
Utilize advanced statistical frameworks (drawing on Classical Test Theory or Item Response Theory) to detect rater drift, identify differential item functioning, and implement systemic interventions.

Google is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. See also Google's EEO Policy and EEO is the Law. If you have a disability or special need that requires accommodation, please let us know by completing our Accommodations for Applicants form.

About Google

Google

Mountain View, California

Website LinkedIn

Apply faster

Skip the form. ApplyBolt does it in seconds.

The iPhone app tailors your resume for this role and submits the real application for you. Same process, same confirmation emails, just way less of your day.

Resume rewritten for this exact role in seconds
Submits the actual employer form, no shortcuts
Real confirmation emails land in your inbox

Install for iPhone

Free to try · iPhone only · No account required to browse