Etched
San Jose, CA
Architecture Intern - Inference
Job details
- Location
- San Jose, CA
- Work type
- Onsite
- Posted
- Dec 8, 2025
- Apply on
- jobs.ashbyhq.com
About this role
The role
We are seeking a talented Architecture intern to join our team and contribute to the design of next-generation AI accelerators. This role focuses on developing and optimizing compute architectures that deliver exceptional performance and efficiency for transformer workloads. You will work on cutting-edge architectural problems and performance modeling over the course of your internship.
Key responsibilities
- Support porting state-of-the-art models to our architecture. Help build programming abstractions and testing capabilities to rapidly iterate on model porting.
- Assist in building, enhancing, and scaling Sohu’s runtime, including multi-node inference, intra-node execution, state management, and robust error handling.
- Contribute to optimizing routing and communication layers using Sohu’s collectives.
- Utilize performance profiling and debugging tools to identify bottlenecks and correctness issues.
- Develop and leverage a deep understanding of Sohu to co-design both HW instructions and model architecture operations to maximize model performance
- Implement high-performance software components for the Model Toolkit
You may be a good fit if you have
- Progress towards a Bachelor’s, Master’s, or PhD degree in computer science, computer engineering, or a related field
- Proficiency in C++ or Rust.
- Understanding of performance-sensitive or complex distributed software systems, e.g. Linux internals, accelerator architectures (e.g. GPUs, TPUs), Compilers, or high-speed interconnects (e.g. NVLink, InfiniBand).
- Familiarity with PyTorch or JAX.
- Ported applications to non-standard accelerator hardware or hardware platforms.
- Deep knowledge of transformer model architectures and/or inference serving stacks (vLLM, SGLang, etc.)
Strong candidates may have some experience with:
- Low-latency, high-performance applications using both kernel-level and user-space networking stacks.
- Deep understanding of distributed systems concepts, algorithms, and challenges, including consensus protocols, consistency models, and communication patterns.
- Solid grasp of Transformer architectures, particularly Mixture-of-Experts (MoE).
- Built applications with extensive SIMD (Single Instruction, Multiple Data) optimizations for performance-critical paths.
Program details:
- 12-week paid internship (June - August 2026)
- Generous housing support for those relocating
- Daily lunch and dinner in our office
- Based at our office in San Jose, CA
- Direct mentorship from industry leaders and world-class engineers
- Opportunity to work on one of the most important problems of our time