Intern, Deep Learning for Image and Video Processing

Onsite$45 – $55/hrPosted 3 weeks agoWebsite LinkedIn

Manual Apply

Skip the busywork

ApplyBolt rewrites your resume for this exact role and hits submit. You just pick the jobs.

Resume tailored to this roleApplied in secondsTrack every application

Download the app

About this role

Job Summary

The InterDigital AI Lab is seeking motivated Ph.D. and Master students for internships in Deep Learning for Image and Video Processing.

The Lab’s research focuses on novel applications of AI and machine learning for video and wireless applications. Unlike traditional roles that involve tinkering with obscure parts of ads or social networks, you’ll have the opportunity to develop proof-of-concepts, prototypes, and research ideas that integrate machine learning and AI technologies with a wide range of video and wireless technologies. Your work will have the chance to be published at leading conferences.

Our recent work focuses on distributed, real-time computer vision for edge-cloud systems. We reduce bandwidth usage by compressing intermediate feature maps using a practical mix of learned methods and standard video codecs (e.g., H.264). We mitigate network latency with Dedelayed, a delay-aware split inference technique recently submitted to ICLR. We also created the leading open-source library for deep learning-based image and video compression (github.com/InterDigitalInc/CompressAI), and an open-source platform used by MPEG to benchmark video coding for distributed vision models (github.com/InterDigitalInc/CompressAI-Vision).

Come intern with us if you are interested in one of the following research areas:

Data compression:

Deep learning-based image and video compression.
Deep learning-based image and video restoration.
Compression for multiple computer vision tasks.
Compression of intermediate feature maps for split inference.
Compression of tokens and embeddings in ViTs and VLMs.
Compression enhanced by language embeddings as side information.
Differentiable proxy/surrogate of video codecs; variable-rate control; error resilience.

Real-time distributed inference:

Distributed real-time video inference resilient to wireless latency, jitter, and bandwidth limits.
Vision tasks: semantic segmentation, depth estimation, object detection, future-frame prediction, super-resolution.
Motion modelling and prediction for latency mitigation in dynamic scenes.
Inference that adapts to fluctuating network, compute, and power constraints.
Low-power mobile device models assisted by powerful cloud models in real-time.
Applications: autonomous driving, delivery drones, robotics.
Efficient model inference.

And:

Visually realistic Generative Adversarial Networks (GANs).
Unique or novel ideas.