.png?1761688593)
Prolaio
Chicago, IL
Data Science Intern
Skip the busywork
ApplyBolt rewrites your resume for this exact role and hits submit. You just pick the jobs.
Resume tailored to this roleApplied in secondsTrack every application
Download the appAbout this role
The Overview
As the Data Science Intern, you wil develop, operationalize, and validate Large Language Model (LLM) pipelines capable of extracting high-priority clinical endpoints from longitudinal Electronic Health Record (EHR) data. This role is critical for scaling the EHR study data by automating the extraction of complex clinical phenotypes and validating them against manual clinical review to ensure high-quality data for clinical analysis.
The Specifics
- Endpoint Extraction Pipeline Development: Develop Python/LLM workflows, including workflows built on purpose-built clinical extraction tools, to ingest unstructured data (clinical notes, discharge summaries) and extract key study endpoints, specifically Clinical Events or “Unified Problem Lists”
- Validation Framework Execution: Design and conduct a human review validation study comparing LLM-generated abstractions against a “gold standard” dataset derived from manual chart review.
- Codebase Delivery: Build and maintain a documented code repository that inputs raw EHR data and outputs structured clinical datasets for study data.
- Performance Analysis & Reporting: Analyze pipeline performance to establish concordance, sensitivity, and specificity metrics, delivering a final validation report with performance metric for multiple approaches.
- Cross-Functional Collaboration: Collaborate with clinical and technical mentors to translate clinical requirements into technical solutions.