Office of Research Computing

What are options one can use with sbatch?

Some of the common options that one might use with sbatch are:

  • -J=<name-of-job> – Use <name-of-job> instead of the default job name which is the script file name.
  • -i=/path/to/dir/inputfilename – Use the “inputfilename” as input file for this job.
  • -o=/path/to/dir/outputfilename – Use the “outputfilename” as the output file for this job.
  • -e=/path/to/dir/errorfilename – Use the “errorfilename as the file name for errors encountered in this job.
  • -n=<number> – The number of tasks to run, also specifies number of slots needed.
  • –mem=<MB> – The total memory needed for the job, use if more than the default is needed
  • –mail-user=[email protected] – Send mail to your GMU email account
  • –mail-type=BEGIN,END – Send email before and  after end of job.

Read the man pages (man sbatch) for more sbatch options.

How do I submit jobs?

The command for submitting a batch job is:

$ sbatch <script file name> (The default partition is all-HiPri)

If the command is successful you will see the following:

Your job <job id number> (“script file name”) has been submitted.

You can also submit the job without a script by listing the options (see the next question for options) on the command line.

$ sbatch   [options] <jobname>

What are the partition (queue) names?

 

Partition Name Nodes in Partition Restricted Access
all-HiPri* [001-0039,041-049,051-054,057-070] no
all-LoPri [001-039,041-049,051-054,057-070] no
bigmem-LoPri [034,035,069,070] no
bigmem-HiPri [034,035,069,070] no
gpuq [40,50] no
COS_q [028-035] yes
CS_q [007-024,056] yes
CDS_q [046-049,051] yes

*all-HiPri is the default partition (queue).

all-HiPri and bigmem-HiPri both have a run time limit of 12 hours.  Jobs exceeding the time limit will be killed.  all-LoPri and bigmem-LoPri both have a 5 day run time limit for jobs.   The partitions bigmem-LoPri and bigmem-HiPri are intended for jobs that will require a lot of memory. Access to the queues marked as “restricted access” is limited to members of research groups and departments that have funded nodes in the cluster.

Can I log into individual nodes to submit jobs?

Users should not log into individual nodes to run jobs. Users have to submit jobs to the scheduler on the head node. Compute intensive jobs running on nodes that are not under scheduler control (i.e. directly started on the nodes) will be killed without notice.

Users can log into nodes on which their jobs are running which were previously submitted via the scheduler. This ability to ssh into individual nodes is only for checking on your job/(s) that is currently running on that node. Please note that if users are using this mode of “sshing” into nodes to start new jobs on the nodes without going through the scheduler, then their ability to ssh into nodes to check on jobs will be removed.

Can I run jobs on the head node?

You can use the head node to develop, compile and test a sample of your job before submitting to the queue. Users cannot run computationally intensive jobs on the head nodes. If such jobs are running on the head node, they will killed without notice.

All jobs have to be submitted on the head node via  Slurm scheduler which will schedule your jobs to run on the compute nodes.