National Laboratory of the Rockies→
Graduate Summer Intern – I/O optimization for data-intensive workflows on HPC
InternshipOn-site
Location
Golden, CO
Salary
$44k–$71k/yr
Experience
No experience required
Posted
2 weeks ago
Skills
pythonjuliaunix/linux systemsshell scriptingcondajupyterpytorchtensorflowai/ml frameworkslarge data set processinganalysishpc profiling toolsdarshanosu mpi test suitehigh-performance computingdistributed computingworkflow developmentjob scheduler integrationdata management of large datasetsdata movementdata sharing with remote userssystemsquickly learn new programming languagesframeworksadaptability to changing research demands
Job Description
Summary: The National Laboratory of the Rockies (NLR) is a leading institution for energy systems research and development located in Golden, Colorado. They are seeking a Graduate Summer Intern to focus on I/O optimization for data-intensive workflows on HPC systems, which involves assessing and improving the performance of I/O operations within scientific computing workflows.
Responsibilities:
- Familiarize yourself with the NLR flagship datasets (e.g. WTK, NSRDB, Sup3rCC) and scientific workflows that utilize them
- Understand and leverage software frameworks to analyze the storage structure in NLR HPC systems: parallel file systems, Lustre, Progressive File Layouts, HDF5 chunking, etc
- Collaborate with NLR researchers to utilize profiling tools (e.g., Darshan) to analyze I/O footprint of scientific workflows: random and contiguous reads/writes, latency and throughput, metadata operations
- Design and create representative I/O benchmarks
- Process and visualize the profiling results to suggest and implement I/O improvements
- Author, present, and assist in the preparation of technical papers, reports, and conference proceedings on topics related to HPC storage and data movement
Required Qualifications:
- Minimum of a 3.0 cumulative grade point average
- Undergraduate: Must be enrolled as a full-time student in a bachelor's degree program from an accredited institution
- Post Undergraduate: Earned a bachelor's degree within the past 12 months. Eligible for an internship period of up to one year
- Graduate: Must be enrolled as a full-time student in a master's degree program from an accredited institution
- Post Graduate: Earned a master's degree within the past 12 months. Eligible for an internship period of up to one year
- Graduate + PhD: Completed master's degree and enrolled as PhD student from an accredited institution
- Applicants are responsible for uploading official or unofficial school transcripts, as part of the application process
- If selected for position, a letter of recommendation will be required as part of the hiring process
- Must meet educational requirements prior to employment start date
Preferred Qualifications:
- Demonstrated programming experience in multiple languages such as Python and/or Julia on Unix/Linux systems
- Demonstrated experience in Shell scripting and Conda/Jupyter use
- 2+ years experience using PyTorch or Tensorflow or a related AI/ML framework
- Demonstrated experience working with large data sets (e.g., CFD outputs, NSRDB, WTK, etc.) and interfaces for science applications, including data processing and analysis tools
- Familiarity with HPC profiling and benchmarking tools, such as Darshan and OSU MPI Test Suite
- Experience with high-performance computing, distributed computing, and related technologies
- Ability to quickly learn new programming languages and frameworks and adapt to changing research demands in a fast-paced scientific environment
- Demonstrated experience developing workflows, including job scheduler integration, data management of large datasets, data movement, and data availability/sharing with remote users and systems
Required Skills: Python, Julia, Unix/Linux systems, Shell scripting, Conda, Jupyter, PyTorch, TensorFlow, AI/ML frameworks, Large data set processing, analysis, HPC profiling tools, Darshan, OSU MPI Test Suite, High-performance computing, Distributed computing, Workflow development, Job scheduler integration, Data management of large datasets, Data movement, Data sharing with remote users, systems, quickly learn new programming languages, frameworks, Adaptability to changing research demands
Benefits: Medical, dental, and vision insurance, 403(b) Employee Savings Plan with employer match, Sick leave (where required by law)
Benefits
Medical, dental, and vision insurance
403(b) Employee Savings Plan with employer match
Sick leave (where required by law)