FAQ Group: Running Jobs

My Python script doesn’t print anything out until the program ends

Post author By Swabir Silayi
Post date October 14, 2020

How do I run graphics applications when on the cluster

Post author By Swabir Silayi
Post date October 13, 2020

The most common reason for getting the error message

 _tkinter.TclError: couldn't connect to display "localhost:10.0"

 Tkinter.tclerror: no display name and no $display environment variable

when you attempt to run a graphics application while on the cluster is a failure to ssh with the -X option [ X11 forwarding ]. Log out and then log back in including the -X option as

 ssh -X <NetID>@argo.orc.gmu.edu

If your application uses such packages as Matplotlib, Tensorflow and Pytorch, you can use the Agg (Anti-Grain Geometry) rendering engine instead of X11. To do this, include the following code in your Python script

import matplotlib 

matplotlib.use('Agg')

The same effect can be achieved by including the following in your script

import os 

import matplotlib as matpl 

if os.environ.get('DISPLAY','') == '': 

    print('Currently no display found. Using the non-interactive Agg backend') 

    matpl.use('Agg') 

import matplotlib.pyplot as plot

My jobs keep failing with OUT_OF_MEMORY errors

Post author By Alastair Neil
Post date September 21, 2020

The ARGO cluster now enforces hard memory limits on jobs. The default memory limit for a job is 2 GB per core per node. So if you request 4 cores on a single node your job will be limited to 8 GB. If your job exceeds the amount of memory allocated it will be killed and the state recorded as OUT_OF_MEMORY. It is best to request some small increment more that your job actually requires as it is hard to precisely estimate the memory requirements in advance. You can request a set amount of memory by specifying

#SBATCH --mem=XX[K|M|G|T]
e.g. to request 8 GigaBytes use:
#SBATCH --mem=8G

Where XX represents the amount of memory your job requires plus some small (10%) padding and the suffix K|M|G|T denotes, Kilo-, Mega-, Giga- and Tera- Bytes respectively.

Alternatively, it may be preferable to specify a memory per core limit :

#SBATCH --mem-per-cpu=XX[K|M|G|T]

The sacct command can display the memory used by prior jobs if you wish to review the actual memory consumption and fine-tune your memory request for future runs. For example:

> sacct -o JobID,JobBane,Start,MaxRSS,state -S mm/dd

here a start is specified in month/day form.

Do you have sample scripts?

Post author By jsarma
Post date May 20, 2019

The script file will contain all the options one needs to use for a specific job. The “##” is a comment line. But a “#SBATCH” is a line containing submit options. The first line is always “#!” which specifies the beginning of shell script.

#!/bin/bash # ## Specify Job name if you want ## the short form -J #SBATCH --job-name=My_Test_Job ## ## Specify a different working directory ## Default working directory is the directory from you submit your job ## Short form -D #SBATCH --workdir=/path/to/directory/name ## ## Specify output file name ## If you want output and error to be written to different files ## You will need to provide output and error file names ## short form -o #SBATCH --output=slurm-output-%N-%j.out ## %N is the name of the node on which it ran ## %j is the job-id ## NOTE this format has to be changed if Array job ## filename-%A-%a.out - where A is job ID and a is the array index ## ## Specify error output file name ## short form -e #SBATCH --error=slurm-error-%N-%j.out ## ## Specify input file ## short form -i ## Send email #SBATCH --mail-user= ## Email notification for the following types #SBATCH --mail-type=BEGIN,FAIL,TIME_LIMIT_80 ## Some valid types are: NONE,BEGIN,END,FAIL,REQUEUE ## ## Select partition to run this job ## Default partition is all-HiPri - run time limit is 12 hours ## short form -p #SBATCH --partition=all-LoPri ## ## Quality of Service; Priority ## Contributor's queue needs QoS to be specified for jobs to run ## Everyone is part of normal QoS, so does not have to specified #SBATCH --qos=normal ## ## Ask for Intel machine using Feature parameter ## short form -C ## Intel Nodes - Proc16, Proc20, Proc24 ## AMD nodes - Proc64 #SBATCH --constraint="Proc24" ## ## Ask for 1 node and the number of slots in node ## This can be 16|20|24 ## short form -N #SBATCH --nodes=1 ## ## Now ask for number of slots #SBATCH --tasks-per-node=16 ## ## MPI jobs ## If you need to start a 64 slot job, you can ask for 4 nodes with 16 slots each #SBATCH --nodes 4 #SBATCH --tasks-per-node=16 ## ## How much memory job needs specified in MB ## Default Memory is 2048MB ## Memory is specified per CPU #SBATCH --mem-per-cpu=4096 ## ## Load the needed modules module load ..... ## Start the job java –Xmx<memneeded> -jar test.jar

For MPI, Matlab, R jobs, please go the ORC-WIKI.

Can we request multiple cores/slots?

Post author By jsarma
Post date May 20, 2019

Where can I find more information about Slurm scheduler?

Post author By jsarma
Post date May 20, 2019

How do I delete jobs?

Post author By jsarma
Post date May 20, 2019

How do I find out information about completed jobs?

Post author By jsarma
Post date May 20, 2019

How do I check status of jobs?

Post author By jsarma
Post date May 20, 2019

Where can I find examples of job scripts?

Post author By jsarma
Post date May 20, 2019