Categories
News

Releasing New Nodes on Hopper

We are happy to announce that starting Sep 07, 2022, the new AMD CPU and GPU nodes which were being beta-tested will be put into production on Hopper. These nodes will be integrated into the existing partitions on Hopper. The main points to note about the nodes are: 

  • All the new nodes have AMD CPUs with 64 or 128 cores each  
  • The new GPU nodes all have 4 NVIDIA A100 80GB GPUs and 64 CPU cores with 512GB of RAM.

The new nodes will also include a few high memory nodes (1TB/2TB/4TB) that will be accessible from a new, ‘bigmem’ partition for memory-intensive calculations. The table below summarizes the updated partitions. More details can be found on our Resources Page

Summary of SLURM partition updates

Updated Partitions Current Partitions 
normal normal, amd-test 
contrib contrib 
contrib-gpu gpu-test 
gpuq gpuq, gpu-test 
bigmem amd-test 

 New SLURM Partition Details

PARTITIONTIMELIMIT[Days-Hours:Mins:Secs]NODESNODELIST
debug0-01:00:003hop[043-045]
interactive0-12:00:003hop[043-045]
contrib7-00:00:0042hop[001-042]
normal*7-00:00:0096hop[046-073],amd[001-068]
bigmem7-00:00:0022amd[069-090]
gpuq5-00:00:0015dgx[001-002],gpu[012-024]
contrib-gpu5-00:00:0011gpu[001-010]

Beta testers who have been using the ‘amd-test’ and ‘gpu-test’ partitions are advised to change their scripts to the updated partitions as the *-test partitions will no longer be usable starting Sep 12, 2022. 

AMD Head Nodes/Login Nodes

You can now access two additional head-nodes on Hopper, each of which has dual AMD CPUs and a GPU card.  This will allow for the testing as well as compilation GPU codes directly on the head nodes.  

Access to the cluster through ssh will work as normal. With the new additions, it will now be possible to ssh directly into either the AMD login nodes or the Intel login nodes.

Summary of available Login/Head-nodes on Hopper

Name  CPUS GPU Memory 
hopper-intel 48 Intel 384 GB RAM 
hopper-amd 64 AMD  1 NVIDIA T4 256 GB RAM 

To access the Intel head nodes hopper1/hopper2 directly, use

To access the AMD head nodes, hop-amd-1/hop-amd-2 directly, use

Correctly Running Distributed Jobs

Since all these nodes will be added to the existing partitions, going forward it will be important to use the constraints below for distributed multi-node jobs so that they run on the same architecture 

  • #SBATCH –constraint=amd for running on the AMD nodes 
  • #SBATCH –constraint=intel for running on the Intel nodes 

Using the Correct Software Stack with the New Nodes

It is also important to make sure that you’re using the new software stack that has been built to run across the different nodes regardless of architecture. To do this, you would need to use the gnu10 and openmpi/4.1.2 compiled software by first loading the necessary modules with

  • module load gnu10
  • module load <package-name>

This will switch out the modules available with those that are newly built and have been tested to run on the AMD nodes.