TikTok→
Research Scientist Intern (TikTok-Privacy Innovation Lab-Multimodal Generative Model) - 2026 Start (PhD)
InternshipOn-site
Location
San Jose, CA
Salary
$125k–$125k/yr
Experience
Not specified
Posted
2 months ago
Skills
diffusion modelsdit architecturepytorchtext-to-image modelsmultimodal modelingresearch publicationsopen-source contributionsdebugging skills
Job Description
Summary: TikTok is the leading destination for short-form mobile video, and they are seeking a Research Scientist Intern to contribute to the Privacy Innovation Lab. This role involves developing next-generation generative foundation models with a focus on privacy-sensitive environments and collaborating with teams to optimize model training and architecture.
Responsibilities:
- You will participate in the architecture design and deep optimization of next-generation text-to-image and text-to-video models, including but not limited to:
- Develop a deep understanding of and optimize DiT + Flow Matching / Rectified Flow–based generative models
- Lead or contribute to the design and implementation of: Diffusion Transformer (DiT / MM-DiT) architecture improvements; Unified text-to-image / text-to-video model designs; Latent space, tokenization, and conditioning mechanisms
- Perform joint algorithmic and system-level optimization, targeting: Training stability and convergence speed; Memory and compute efficiency; Generation quality and consistency
- Address challenges in long-sequence, high-resolution, and video generation, including: Efficient attention and temporal modeling strategies; Long-context and long-latent modeling
- Collaborate closely with systems and kernel engineers to map model designs to efficient implementations
- Reproduce, analyze, and advance state-of-the-art generative models (beyond simple replication)
Required Qualifications:
- Currently pursuing PhD in Computer science, computer engineering, or a related technical discipline
- Deep understanding of Diffusion / Flow Matching / Rectified Flow
- Strong familiarity with DiT / Transformer-based architectures in generative modeling
- Ability to debug the full pipeline from mathematical formulation → code → training → generated outputs
- Proficiency with PyTorch and hands-on experience training large-scale models
Preferred Qualifications:
- Practical experience with text-to-image or text-to-video models (non-toy systems)
- Familiarity with multimodal modeling (Text / Image / Video / Audio)
- Research publications or open-source contributions
Required Skills: Diffusion models, DiT architecture, PyTorch
Important Skills: Text-to-image models, Multimodal modeling
Nice-to-Have Skills: Research publications, Open-source contributions, Debugging skills
Internship Start Date: Start in 2026
Benefits: Day one access to health insurance, Life insurance, Wellbeing benefits, 10 paid holidays per year, Paid sick time (56 hours if hired in first half of year, 40 if hired in second half of year), Housing allowance
Benefits
Day one access to health insurance
Life insurance
Wellbeing benefits
10 paid holidays per year
Paid sick time (56 hours if hired in first half of year, 40 if hired in second half of year)
Housing allowance