SLURM Usage

Interactive Jobs

These can be run in two ways, via salloc and srun. If you just want a single interactive session on a compute node then using srun to allocate resources for a single task and launch a shell as that one task is probably the way to go. But if you want to run things in parallel or more than one task at once in your interactive job, use salloc to allocate resources and then srun or mpirun to start the tasks, as starting multiple copies of an interactive shell at once probably isn’t what you want.

# One interactive task. Quit the shell to finish
srun --pty -u bash -i
# Allocate three tasks, followed by running three instances of 'myprog' within the allocation.
# Then start one copy of longprog and two copies of myprog2, then release the allocation
salloc -n3
srun myprog
srun -n1 longprog &
srun -n2 myprog2
exit

Batch Jobs

Scripts for batch jobs must start with the interpreter to be used to excute them (different from PBS/Torque). You can give arguments to sbatch as comments in the script. Example:

#!/bin/bash
# Name of the job
#SBATCH -J testjob
# Partition to use - generally not needed so commented out
##SBATCH -p NONIB
# time limit
#SBATCH --time=10:0:0
# Number of processes
#SBATCH -n1
# Start the program
srun myprogram

Asking for resources:

salloc/srun/sbatch support a huge array of options which let you ask for nodes, cpus, tasks, sockets, threads, memory etc. If you combine them SLURM will try to work out a sensible allocation, so for example if you ask for 13 tasks and 5 nodes SLURM will cope. Here are the ones that are most likely to be useful:

Opiton Meaning
-n Number of tasks (roughly, processes)
-N Number of nodes to assign. If you’re using this, you might also be interested in –tasks-per-node
--tasks-per-node Maximum tasks to assign per node if using -N
--cpus-per-task Assign tasks containing more than one CPU. Useful for jobs with shared memory parallelization
-C Features the nodes assigned must have
-w Names of nodes that must be included – for selecting a particular node or nodes
--mem-per-cpu Use this to make SLURM assign you more memory than the default amount available per CPU. The units are MB. Works by automatically assigning sufficient extra CPUs to the job to ensure it gets access to enough memory.

MPI jobs

Inside a batch script you should just be able to call mpirun, which will communicate with SLURM and launch the job over the appropriate set of nodes for you:

#!/bin/bash
# 13 tasks over 5 nodes
#SBATCH -n13 -N5
echo Hosts are
srun -l hostname
mpirun /home/cen1001/src/mpi_hello

To run MPI jobs interactively you can assign some nodes using salloc, and then call mpirun from inside that allocation. Unlike PBS/Torque, the shell you launch with salloc runs on the same machine you ran salloc on, not on the first node of the allocation. But mpirun will do the right thing.

salloc -n12 bash
mpirun /home/cen1001/src/mpi_hello

You can even use srun to launch MPI jobs interactively without mpirun‘s intervention. The --mpi option here is to tell srun which method the MPI library uses for launching tasks. This is the correct one for use with our OpenMPI installations.

srun --mpi=pmi2 -n13 ./mpi_hello

OpenMP jobs

Since Nestum cluster is homogeneous Intel Xeon based parallel machine where each node has 32 compute cores with shared memory. The users can run parallel shared memory (OpenMP) jobs specifying one task -n 1 with maximum 32 cores per task -c NUMBER_OF_CORES_PER_TASK.

#!/bin/bash

...
#SBATCH -n 1
#SBATCH -c NUMBER_OF_CORES_PER_TASK
...

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun myApp

Hybrid parallel jobs

If there is a need to run hybrid MPI and OpenMP parallel jobs, again you should specify the number or task’s -n NUMBER_OF_MPI_PROCESS and number of OpenMP threads per MPI process (Number of cores per task) -c NUMBER_OF_CORES_PER_TASK

#!/bin/bash

...
#SBATCH -n NUMBER_OF_MPI_PROCESS
#SBATCH -c NUMBER_OF_CORES_PER_TASK
...

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun myApp

For example, if you want to run a hybrid parallel job on three computing nodes, utilizing all cores on each node with OpenMP threads, then NUMBER_OF_MPI_PROCESS = 3 and NUMBER_OF_CORES_PER_TASK = 32

#!/bin/bash

...
#SBATCH -n 3
#SBATCH -c 32
...

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun myApp

then the total number of threads will be 96, i.e. 3 (three) MPI processes each with 32 OpenMP threads.

Non-MPI Parallel jobs

In a parallel job which doesn’t use MPI you can find out which hosts you have and how many by running srun -l hostname inside the job script. The -l option will print the slurm task number next to the assigned hostname for the task, skip it if you want just the list of hostnames.

You can then use srun inside the job to start individual tasks.