Abaka AI→
Data Operations Engineer
Entry LevelOn-siteFull-time
Location
Mountain View, CA
Salary
$100k–$160k/yr
Experience
1–4 years
Posted
Today
Skills
sqlpythondata operationsdata engineeringdataset quality validationmultimodal datasetsdata annotationai/ml workflowsmandarin chinese
Job Description
Summary: Abaka AI is on a mission to be the world’s most trusted data partner for AI companies, serving over 1,000 industry leaders across various AI domains. The Data Operations Engineer will own and operate the internal dataset library, ensuring fast and scalable access to data while maintaining dataset quality and organization across the company.
Responsibilities:
- Develop and maintain a comprehensive understanding of Abaka AI’s dataset library, including data structure, quality, and applicable use cases across modalities (text, image, video, audio, 3D)
- Serve as the internal point of contact for dataset-related inquiries, providing clear and timely responses to questions from engineering, product, and business teams
- Translate ambiguous or high-level requests into concrete dataset solutions, identifying appropriate data sources or gaps
- Inspect and validate datasets for quality, completeness, and consistency using SQL, Python, or other tools as needed
- Coordinate with global data teams, including teams in China, to resolve data issues, clarify requirements, and ensure timely delivery without unnecessary escalation
- Maintain and improve internal documentation, organization, and accessibility of datasets
- Identify inefficiencies in current workflows and propose improvements to systems, tooling, and processes that support dataset management and usage
- Support cross-functional initiatives by providing dataset insights, technical context, and operational guidance
Required Qualifications:
- Bachelor's degree in Computer Science, Data Engineering, or a related field, or equivalent practical experience
- 1–4 years of experience in data operations, data engineering, or a related role involving direct interaction with datasets
- Professional proficiency in Mandarin Chinese and English is required, as this role involves frequent collaboration with China-based vendors and external partners
- Strong problem-solving skills and ability to operate effectively in ambiguous, fast-paced environments
- Proficiency in SQL and/or Python for data inspection, validation, and basic analysis
- Experience working with real-world datasets, including handling data quality issues, inconsistencies, and edge cases
- Strong communication skills, with the ability to work across technical and non-technical teams
- High level of ownership and accountability, with the ability to manage multiple requests and priorities simultaneously
Preferred Qualifications:
- Experience with multimodal datasets (text, image, video, audio, or 3D)
- Familiarity with data annotation, labeling workflows, or dataset preparation for machine learning
- Experience working with international teams, particularly in cross-border environments
- Exposure to AI/ML workflows, including training, fine-tuning, or evaluation datasets
Required Skills: SQL, Python, Data operations, Data engineering, Dataset quality validation, Multimodal datasets, Data annotation, AI/ML workflows, Mandarin Chinese
Benefits: Equity, A comprehensive benefits package (health, dental, vision, PTO, flexible work schedule)
Benefits
Equity
A comprehensive benefits package (health, dental, vision, PTO, flexible work schedule)