Jobs submission

The Nestum cluster use SLURM batch system for controlling user jobs. One good introduction using slurm can be found here. Convinient slurm commands are listed below. A Postfix client, installed on the cluster, will notify you of the status of your jobs via email.

Submit interactive job

In order to have interactive job you can run following command:

srun -p interact.p --pty /bin/bash

Intel Parallel XE Cluster Edition 2017 compilers are available only via interactive jobs

Submit batch job

The batch job’s in slurm are submitted with sbatch command and they represent simple shell script files with additional parameters passed to sbatch escaped with #SBATCH. For example lets consider following bash shell script my.job:

#!/bin/bash
#
#SBATCH -p medium.p               # partition (queue)
#SBATCH -N 2                      # number of nodes
#SBATCH -n 64                     # number of cores
#SBATCH -t 0-2:00                 # time (D-HH:MM)
#SBATCH -o slurm.%N.%j.out        # STDOUT
#SBATCH -e slurm.%N.%j.err        # STDERR
#SBATCH --mail-type=<type>        # notification trigger
#SBATCH --mail-user=<user>        # email address

module load openmpi

mpirun helloworld.x

Although the command line parameters in above script are self explanatory and well document in sbatch man page. Lets have a few words for each option

#SBATCH -p medium.p               # partition (queue)

set the partition (queue) in which job will be submitted. If this option is omitted default queue is used. Next option

#SBATCH -N 2                      # number of nodes

set the number of nodes which will be allocated for the job in our case they are 2 in addition we need to set and total number of compute cores

#SBATCH -n 64                     # number of cores

set the number of task’s to be executed. Since the default number of cpus-per-task is 1 and each of requested 2 compute has 32 cores the total number if 64 cores will be allocated. Th execution time is specified with -t option

#SBATCH -t 0-2:00                 # time (D-HH:MM)

if this valued is omitted default value is 10 minutes. And finally the standard output and standard error stream can be redirected into files slurm.%N.%j.out and slurm.%N.%j.out respectively where %N reperesent the Node id and %j is a task id.

#SBATCH --mail-type=<type>
#SBATCH --mail-user=<user>      

these lines set the rules for email notification. Setting these is not compulsory. The can be ALL, BEGIN, END and FAIL, which are self-explanatory. The field should be the desired email address. If left blank, notifications will be send on the email address associated with the user account.

Submitting job is quite simple

sbatch -t 0-3:00 my.job

in above example the command line option -t will override the option in job file.

List job’s

In order to list all running job’s you can use the command squeue without any additional arguments:

squeue

if you like to get the running job’s of particular user use command switch -c

squeue -u user

Cancel job

In case you need to cancel running job. First obtain job id using squeue command, then use command scancel

scancel job_id

Queue status

The state of computational nodes in slurm environment can be reviewed using command

sinfo