Job Description

Summary: Clipto.AI is a company focused on building on-device multimodal intelligence to transform personal media into searchable knowledge systems. The AI Engineer will design and deploy high-performance speech systems, optimize models for mobile deployment, and develop architectures for real-time interaction between audio, text, and visual inputs. Responsibilities: - Design and deploy high-performance ASR and Speaker Diarization pipelines specifically for on-device environments - Apply advanced techniques like quantization pruning, and knowledge distillation to foundational models for mobile/embedded deployment - Develop architectures that enable real-time, low-latency interaction between audio, text, and visual inputs Required Qualifications: - Minimum Bachelor's degree in CS, EE, or a related field with a focus on signal processing or machine learning - Deep experience with Whisper, RNN-T, or CTC-based architectures - Highly proficient in Python and C++ - You are comfortable with ambiguity and thrive in a fast-paced 'build-test-learn' cycle Required Skills: Automatic Speech Recognition (ASR), Speaker Diarization, Model Compression, Quantization, Pruning, Knowledge Distillation, Multimodal Fusion, Whisper, RNN-T, CTC-based Architectures, Python, C++ Benefits: Full Support: Competitive compensation with H1-B and Green Card sponsorship available., Competitive compensation with H1-B and Green Card sponsorship available

Ready to automate your job applications?

AI Engineer – On-Device Speech & Multimodality

Skills

Job Description

Benefits

Interested in this role?

Ready to automate your job applications?