Clipto.AI→
AI Engineer – On-Device Speech & Multimodality
InternshipHybrid
Location
Palo Alto, CA
Salary
Not listed
Experience
Not specified
Posted
Today
Skills
automatic speech recognition (asr)speaker diarizationmodel compressionquantizationpruningknowledge distillationmultimodal fusionwhisperrnn-tctc-based architecturespythonc++
Job Description
Summary: Clipto.AI is a company focused on building on-device multimodal intelligence to transform personal media into searchable knowledge systems. The AI Engineer will design and deploy high-performance speech systems, optimize models for mobile deployment, and develop architectures for real-time interaction between audio, text, and visual inputs.
Responsibilities:
- Design and deploy high-performance ASR and Speaker Diarization pipelines specifically for on-device environments
- Apply advanced techniques like quantization pruning, and knowledge distillation to foundational models for mobile/embedded deployment
- Develop architectures that enable real-time, low-latency interaction between audio, text, and visual inputs
Required Qualifications:
- Minimum Bachelor's degree in CS, EE, or a related field with a focus on signal processing or machine learning
- Deep experience with Whisper, RNN-T, or CTC-based architectures
- Highly proficient in Python and C++
- You are comfortable with ambiguity and thrive in a fast-paced 'build-test-learn' cycle
Required Skills: Automatic Speech Recognition (ASR), Speaker Diarization, Model Compression, Quantization, Pruning, Knowledge Distillation, Multimodal Fusion, Whisper, RNN-T, CTC-based Architectures, Python, C++
Benefits: Full Support: Competitive compensation with H1-B and Green Card sponsorship available., Competitive compensation with H1-B and Green Card sponsorship available
Benefits
Full Support: Competitive compensation with H1-B and Green Card sponsorship available.
Competitive compensation with H1-B and Green Card sponsorship available