Releasing New Nodes on Hopper – Office of Research Computing

We are happy to announce that starting Sep 07, 2022, the new AMD CPU and GPU nodes which were being beta-tested will be put into production on Hopper. These nodes will be integrated into the existing partitions on Hopper. The main points to note about the nodes are:

All the new nodes have AMD CPUs with 64 or 128 cores each
The new GPU nodes all have 4 NVIDIA A100 80GB GPUs and 64 CPU cores with 512GB of RAM.

The new nodes will also include a few high memory nodes (1TB/2TB/4TB) that will be accessible from a new, ‘bigmem’ partition for memory-intensive calculations. The table below summarizes the updated partitions. More details can be found on our Resources Page

Summary of SLURM partition updates

Updated Partitions	Current Partitions
normal	normal, amd-test
contrib	contrib
contrib-gpu	gpu-test
gpuq	gpuq, gpu-test
bigmem	amd-test

New SLURM Partition Details

PARTITION	TIMELIMIT[Days-Hours:Mins:Secs]	NODES	NODELIST
debug	0-01:00:00	3	hop[043-045]
interactive	0-12:00:00	3	hop[043-045]
contrib	7-00:00:00	42	hop[001-042]
normal*	7-00:00:00	96	hop[046-073],amd[001-068]
bigmem	7-00:00:00	22	amd[069-090]
gpuq	5-00:00:00	15	dgx[001-002],gpu[012-024]
contrib-gpu	5-00:00:00	11	gpu[001-010]

Beta testers who have been using the ‘amd-test’ and ‘gpu-test’ partitions are advised to change their scripts to the updated partitions as the *-test partitions will no longer be usable starting Sep 12, 2022.

AMD Head Nodes/Login Nodes

You can now access two additional head-nodes on Hopper, each of which has dual AMD CPUs and a GPU card. This will allow for the testing as well as compilation GPU codes directly on the head nodes.

Access to the cluster through ssh will work as normal. With the new additions, it will now be possible to ssh directly into either the AMD login nodes or the Intel login nodes.

Summary of available Login/Head-nodes on Hopper

Name	CPUS	GPU	Memory
hopper-intel	48 Intel	0	384 GB RAM
hopper-amd	64 AMD	1 NVIDIA T4	256 GB RAM

To access the Intel head nodes hopper1/hopper2 directly, use

ssh [email protected]

To access the AMD head nodes, hop-amd-1/hop-amd-2 directly, use

ssh [email protected]

Correctly Running Distributed Jobs

Since all these nodes will be added to the existing partitions, going forward it will be important to use the constraints below for distributed multi-node jobs so that they run on the same architecture

#SBATCH –constraint=amd for running on the AMD nodes
#SBATCH –constraint=intel for running on the Intel nodes

Using the Correct Software Stack with the New Nodes

It is also important to make sure that you’re using the new software stack that has been built to run across the different nodes regardless of architecture. To do this, you would need to use the gnu10 and openmpi/4.1.2 compiled software by first loading the necessary modules with

module load gnu10
module load <package-name>

This will switch out the modules available with those that are newly built and have been tested to run on the AMD nodes.