Category |
Tasks for Contractor |
HPC administrator tasks |
- Maintain a HPC cluster (hardware, image management, local networking, scheduler, backups).
- Troubleshoot the environment when an incident occurs to ensure a quick return to normal operations.
|
HPC Analyst Tasks
|
- Meet with scientists and evaluate their requirements for HPC support.
- Develop a task plan to meet scientists' needs and consult the technical authority for approval.
- Application builds and installs, runtime troubleshooting (GNU, Intel, Fortran, Nvidia).
- Support for open-source and commercial off-the-shelf (COTS) software, including:
- Python and Anaconda installs.
- Bash scripts, build/make tools, EasyBuild, and Spack.
- MPI implementations (MPICH, OpenMPI, IntelMPI, HPMPI).
- Assist with in-house developed applications (compilation and runtime).
|
Other General Tasks |
Management of:
- Operating system (patching schedule, reliability for Linux distributions).
- Accounts (creation, deletion).
- Configuration via Git, MS DevOps, Ansible Playbooks.
- RPM/DEB Packages.
- Environment modules.
- ThinLinc troubleshooting.
|
Troubleshoot & Hardware |
- Troubleshooting jobs on schedulers (PBS Pro/Torque, SLURM, SGE).
- Ensure reliable CUDA installs, troubleshoot GPU failures and other CUDA software/driver issues.
- Hardware support (memory upgrades, storage arrays, power and network cabling, ILO).
|
Documentation |
- Document each process for every task to ensure enterprise knowledge continuity.
|