Etched
San Jose, CA

Architecture Intern - Inference

OnsitePosted Dec 8, 2025

Job details

Location
San Jose, CA
Work type
Onsite
Posted
Dec 8, 2025
Apply on
jobs.ashbyhq.com

About this role

The role

We are seeking a talented Architecture intern to join our team and contribute to the design of next-generation AI accelerators. This role focuses on developing and optimizing compute architectures that deliver exceptional performance and efficiency for transformer workloads. You will work on cutting-edge architectural problems and performance modeling over the course of your internship.

Key responsibilities

  • Support porting state-of-the-art models to our architecture. Help build programming abstractions and testing capabilities to rapidly iterate on model porting.
  • Assist in building, enhancing, and scaling Sohu’s runtime, including multi-node inference, intra-node execution, state management, and robust error handling.
  • Contribute to optimizing routing and communication layers using Sohu’s collectives.
  • Utilize performance profiling and debugging tools to identify bottlenecks and correctness issues.
  • Develop and leverage a deep understanding of Sohu to co-design both HW instructions and model architecture operations to maximize model performance
  • Implement high-performance software components for the Model Toolkit

You may be a good fit if you have

  • Progress towards a Bachelor’s, Master’s, or PhD degree in computer science, computer engineering, or a related field
  • Proficiency in C++ or Rust.
  • Understanding of performance-sensitive or complex distributed software systems, e.g. Linux internals, accelerator architectures (e.g. GPUs, TPUs), Compilers, or high-speed interconnects (e.g. NVLink, InfiniBand).
  • Familiarity with PyTorch or JAX.
  • Ported applications to non-standard accelerator hardware or hardware platforms.
  • Deep knowledge of transformer model architectures and/or inference serving stacks (vLLM, SGLang, etc.)

Strong candidates may have some experience with:

  • Low-latency, high-performance applications using both kernel-level and user-space networking stacks.
  • Deep understanding of distributed systems concepts, algorithms, and challenges, including consensus protocols, consistency models, and communication patterns.
  • Solid grasp of Transformer architectures, particularly Mixture-of-Experts (MoE).
  • Built applications with extensive SIMD (Single Instruction, Multiple Data) optimizations for performance-critical paths.

Program details:

  • 12-week paid internship (June - August 2026)
  • Generous housing support for those relocating
  • Daily lunch and dinner in our office
  • Based at our office in San Jose, CA
  • Direct mentorship from industry leaders and world-class engineers
  • Opportunity to work on one of the most important problems of our time

About Etched

Etched
San Jose, CA