Google→
Technical Program Manager III, ML Infrastructure Resource Management, Google Cloud
Entry LevelOn-site
Location
New York City, NY
Salary
Not listed
Experience
Not specified
Posted
Today
Job Description
Act as a trusted advisor to Product Area partners, understanding their TPU/GPU requirements and delivering a guided, seamless resource management experience. Collaborate closely with Software Engineering (SWE) and Site Reliability Engineering (SRE) teams to uncover, analyze, and execute on efficiency opportunities across our managed resource footprints. Own the operational execution of capacity allocations and allied workflows using core Google tooling, a technical or engineering background is critical to successfully navigating this significant operational component. Partner cross-functionally to drive tool and process optimizations. Leverage strong data analysis skills to convert fleet metrics into actionable business value and automated scalability. Utilize an understanding of ML fundamentals to inform resourcing decisions, with a preference for practical experience in deploying large-scale ML models.