Summary: Etched is building the world’s first AI inference system purpose-built for transformers, delivering significantly higher performance and lower costs. They are seeking talented interns to contribute to the design of next-generation AI accelerators, focusing on developing and optimizing compute architectures for transformer workloads. Responsibilities: - Support porting state-of-the-art models to our architecture. Help build programming abstractions and testing capabilities to rapidly iterate on model porting - Assist in building, enhancing, and scaling Sohu’s runtime, including multi-node inference, intra-node execution, state management, and robust error handling - Contribute to optimizing routing and communication layers using Sohu’s collectives - Utilize performance profiling and debugging tools to identify bottlenecks and correctness issues - Develop and leverage a deep understanding of Sohu to co-design both HW instructions and model architecture operations to maximize model performance - Implement high-performance software components for the Model Toolkit Required Qualifications: - Progress towards a Bachelor's, Master's, or PhD degree in computer science, computer engineering, applied mathematics, or a related field - Proficiency in Python, C++ - Understanding of performance-sensitive or complex distributed software systems, e.g. Linux internals, accelerator architectures (e.g. GPUs, TPUs), Compilers, or high-speed interconnects (e.g. NVLink, InfiniBand) - Ported applications to non-standard accelerator hardware or hardware platforms - Deep knowledge of transformer model architectures and/or inference serving stacks (vLLM, SGLang, etc.) Preferred Qualifications: - Proficiency in Rust - Low-latency, high-performance applications using both kernel-level and user-space networking stacks - Deep understanding of distributed systems concepts, algorithms, and challenges, including consensus protocols, consistency models, and communication patterns - Solid grasp of Transformer architectures, particularly Mixture-of-Experts (MoE) - Built applications with extensive SIMD (Single Instruction, Multiple Data) optimizations for performance-critical paths - Familiarity with PyTorch or JAX - Math competitions (AIME, AMC, etc) Required Skills: Python, C++, Linux internals, Accelerator architectures, GPUs, TPUs, Compilers, High-speed interconnects, NVLink, InfiniBand, Transformer model architectures, Inference serving stacks, Rust, Kernel-level networking stacks, User-space networking stacks, Distributed systems concepts, Consensus protocols, Consistency models, Communication patterns, SIMD optimizations, PyTorch, JAX Internship Start Date: Start in 2027 Spring Benefits: Generous housing support for those relocating, Daily lunch and dinner in our office, Direct mentorship from industry leaders and world-class engineers

Ready to automate your job applications?

Inference Intern - Spring 2027

Skills

Job Description

Benefits

Interested in this role?

Ready to automate your job applications?