High-Performance Computing
Hopper Cluster
HOPPER is the ORC’s new – 11840 core – primary compute cluster, which provides support for scheduled workloads via Slurm. The cluster has two high-speed networking fabrics: a redundant Ethernet network comprising 100 Gbps spine switches and 25 Gbps leaf switches, and an HDR Infiniband network providing 100 Gbps to each node. Users access HOPPER via ssh connections to login nodes or the new ORC Open OnDemand server, which provides access via a web interface to interactive applications such as Matlab and JupyterLab. A flash based VAST storage cluster provides fast scratch storage on HOPPER.
HOPPER hardware:
- 2x login nodes, each with 48 Intel Xeon cores and 384 GB RAM.
- 2x login nodes, each with 64 AMD Milan cores, 256 GB RAM, and 1x NVIDIA T4 GPU cards.
- 74x compute nodes, each with 48 Intel Xeon Cores and 192 GB RAM.
- 48x compute nodes, each with 64 AMD Milan cores and 256 GB RAM.
- 20x compute nodes, each with 64 AMD Milan cores and 512 GB RAM.
- 12x compute nodes, each with 64 AMD Milan cores and 1 TB RAM
- 8x compute nodes, each with 128 AMD Milan cores and 2 TB RAM.
- 2x compute nodes, each with 128 AMD Milan cores and 4 TB RAM.
- 31x GPU nodes, each with 64 AMD Milan Cores, 512 GB RAM, and 4x NVIDIA A100 80GB GPU cards.
- 2x NVIDIA DGX nodes, each with 128 AMD Milan cores, 1 TB RAM, and 8x A100 40GB GPU cards.
- 2x data transfer nodes (DTNs), each with 32 AMD Milan cores and 512 GB RAM.
- 2PB VAST flash-based fast scratch storage cluster.
.
Of the 31 GPU nodes, 17 have been purchased by contributors, who have preemptive access to their resources. Likewise 42 of the Intel 48-core nodes were purchased by contributors who have the same preemptive access. Several nodes have had their A100 GPUs divided into smaller devices using NVIDIA Multi Instance GPU (MIG) to provide a range of GPU devices with different memory and compute profiles, more details about this can be found on our wiki page.
Table of SLURM Partitions on HOPPER as of 06/01/2024
PARTITION | TIMELIMIT [Days-Hours:Mins:Secs] | NODES | NODELIST |
interactive* | 0-12:00:00 | 3 | hop[043-045], amd[001-004] |
contrib | 7-00:00:00 | 42 | hop[001-042] |
normal | 7-00:00:00 | 96 | hop[046-073],amd[005-068] |
bigmem | 7-00:00:00 | 22 | amd[069-090] |
gpuq | 5-00:00:00 | 15 | dgx[001-002],gpu[011-024] |
contrib-gpu | 5-00:00:00 | 16 | gpu[001-010],gpu[025-031] |
Argo Cluster
The ARGO Cluster is a ~2000 core Linux cluster also providing a scheduled job environment using Slurm. There are a number of different node configurations on ARGO ranging from 16 cores to 32 cores per node. Each node provides a minimum of 4 GB/core, however it has some nodes with 512 GB of RAM and one node with 1.5 TB of RAM. Additionally there are 4 nodes each with 4x NVIDIA 32GB V100, two nodes support NVLink. There are also four nodes with NVIDIA K80 GPUs totaling 20 K80 GPU devices. ARGO supports three workload domains – General, Large Memory, and GPU. The network interconnect is provided by an FDR Infiniband fabric (56 Gbps).
Table of Partitions on ARGO with the nodes
PARTITION | TIMELIMIT [Days-Hours:Mins:Secs] | NODES |
gpuq | 05-00:00:00 | 10 |
all-LoPri | 05-00:00:00 | 67 |
all-HiPri* | 00-12:00:00 | 67 |
bigmem-HiPri | 00-12:00:00 | 5 |
bigmem-LoPri | 05-00:00:00 | 5 |
all-long | 10-00:00:00 | 67 |
bigmem-long | 10-00:00:00 | 5 |
contrib | 07-00:00:00 | 12 |
Storage
Primary file storage for both HOPPER and ARGO is provided by a QuobyteTM parallel filesystem cluster which currently has 500TB of NVMe solid-state high speed storage and 1 PB of high performance disk storage. Additionally, 400 TB of high-speed scratch storage is provided by a 1 PB VAST DataTM all flash memory based storage cluster. ORC account holders all have a 50 GB quota on the primary file storage system. There is no quota on the scratch file system, however, files are purged from the scratch file space every 90 days. An additional 1 TB of project storage may be provided to faculty free of charge on request, and additional storage may be provisioned from the ORC storage cluster at a charge of $50/TB per annum.
Software
A list of software available on the cluster can be found here – Software. We are often able to install software on the cluster on request. If you have specific software that you have bought and would like to use on the cluster, please contact us as we will need to review the license terms. Similarly, if you have an unmet software need please contact us and we can investigate how we can accommodate your requirements. – Request Help
How to Get Started
For Information on eligibility and how to apply for access to ORC HPC resources please see our New User Information page. You can also refer to our wiki documentation with additional examples and instructions for running on the cluster.
General Purpose Computing
Virtual Computing
The ORC has resources to provision virtual servers for research projects that either require a web presence or are not suited to running in a scheduled environment.
The ORC can also facilitate access to Virtual Desktops and also to external VM hosting with either the NSF XSEDE project or with a number of commercial cloud computing providers.
Virtual machine requests are handled on a case by case basis, please contact us so we can investigate how to meet your needs. – Request Help.
Containers
The ORC provides support for running containerized workloads. We run containers using Singularity on both the ARGO and HOPPER Clusters (see this link for more information about Singularity). Instructions on working with containers and running containerized applications on the HOPPER Cluster can be found in the wiki documentation here.