Etched→
Inference Intern - Spring 2027
InternshipOn-siteFull-time
Location
San Jose
Salary
Not listed
Experience
No experience required
Posted
Today
Skills
pythonc++linux internalsaccelerator architecturesgpustpuscompilershigh-speed interconnectsnvlinkinfinibandtransformer model architecturesinference serving stacksrustkernel-level networking stacksuser-space networking stacksdistributed systems conceptsconsensus protocolsconsistency modelscommunication patternssimd optimizationspytorchjax
Job Description
Summary: Etched is building the world’s first AI inference system purpose-built for transformers, delivering significantly higher performance and lower costs. They are seeking talented interns to contribute to the design of next-generation AI accelerators, focusing on developing and optimizing compute architectures for transformer workloads.
Responsibilities:
- Support porting state-of-the-art models to our architecture. Help build programming abstractions and testing capabilities to rapidly iterate on model porting
- Assist in building, enhancing, and scaling Sohu’s runtime, including multi-node inference, intra-node execution, state management, and robust error handling
- Contribute to optimizing routing and communication layers using Sohu’s collectives
- Utilize performance profiling and debugging tools to identify bottlenecks and correctness issues
- Develop and leverage a deep understanding of Sohu to co-design both HW instructions and model architecture operations to maximize model performance
- Implement high-performance software components for the Model Toolkit
Required Qualifications:
- Progress towards a Bachelor's, Master's, or PhD degree in computer science, computer engineering, applied mathematics, or a related field
- Proficiency in Python, C++
- Understanding of performance-sensitive or complex distributed software systems, e.g. Linux internals, accelerator architectures (e.g. GPUs, TPUs), Compilers, or high-speed interconnects (e.g. NVLink, InfiniBand)
- Ported applications to non-standard accelerator hardware or hardware platforms
- Deep knowledge of transformer model architectures and/or inference serving stacks (vLLM, SGLang, etc.)
Preferred Qualifications:
- Proficiency in Rust
- Low-latency, high-performance applications using both kernel-level and user-space networking stacks
- Deep understanding of distributed systems concepts, algorithms, and challenges, including consensus protocols, consistency models, and communication patterns
- Solid grasp of Transformer architectures, particularly Mixture-of-Experts (MoE)
- Built applications with extensive SIMD (Single Instruction, Multiple Data) optimizations for performance-critical paths
- Familiarity with PyTorch or JAX
- Math competitions (AIME, AMC, etc)
Required Skills: Python, C++, Linux internals, Accelerator architectures, GPUs, TPUs, Compilers, High-speed interconnects, NVLink, InfiniBand, Transformer model architectures, Inference serving stacks, Rust, Kernel-level networking stacks, User-space networking stacks, Distributed systems concepts, Consensus protocols, Consistency models, Communication patterns, SIMD optimizations, PyTorch, JAX
Internship Start Date: Start in 2027 Spring
Benefits: Generous housing support for those relocating, Daily lunch and dinner in our office, Direct mentorship from industry leaders and world-class engineers
Benefits
Generous housing support for those relocating
Daily lunch and dinner in our office
Direct mentorship from industry leaders and world-class engineers