These can be run in two ways, via
srun. If you just want a single interactive session on a compute node then using
srun to allocate resources for a single task and launch a shell as that one task is probably the way to go. But if you want to run things in parallel or more than one task at once in your interactive job, use
salloc to allocate resources and then
mpirun to start the tasks, as starting multiple copies of an interactive shell at once probably isn’t what you want.
# One interactive task. Quit the shell to finish srun --pty -u bash -i # Allocate three tasks, followed by running three instances of 'myprog' within the allocation. # Then start one copy of longprog and two copies of myprog2, then release the allocation salloc -n3 srun myprog srun -n1 longprog & srun -n2 myprog2 exit
Scripts for batch jobs must start with the interpreter to be used to excute them (different from PBS/Torque). You can give arguments to sbatch as comments in the script. Example:
#!/bin/bash # Name of the job #SBATCH -J testjob # Partition to use - generally not needed so commented out ##SBATCH -p NONIB # time limit #SBATCH --time=10:0:0 # Number of processes #SBATCH -n1 # Start the program srun myprogram
Asking for resources:
sbatch support a huge array of options which let you ask for nodes, cpus, tasks, sockets, threads, memory etc. If you combine them SLURM will try to work out a sensible allocation, so for example if you ask for 13 tasks and 5 nodes SLURM will cope. Here are the ones that are most likely to be useful:
||Number of tasks (roughly, processes)|
||Number of nodes to assign. If you’re using this, you might also be interested in –tasks-per-node|
||Maximum tasks to assign per node if using -N|
||Assign tasks containing more than one CPU. Useful for jobs with shared memory parallelization|
||Features the nodes assigned must have|
||Names of nodes that must be included – for selecting a particular node or nodes|
||Use this to make SLURM assign you more memory than the default amount available per CPU. The units are MB. Works by automatically assigning sufficient extra CPUs to the job to ensure it gets access to enough memory.|
Inside a batch script you should just be able to call
mpirun, which will communicate with SLURM and launch the job over the appropriate set of nodes for you:
#!/bin/bash # 13 tasks over 5 nodes #SBATCH -n13 -N5 echo Hosts are srun -l hostname mpirun /home/cen1001/src/mpi_hello
To run MPI jobs interactively you can assign some nodes using
salloc, and then call
mpirun from inside that allocation. Unlike PBS/Torque, the shell you launch with
salloc runs on the same machine you ran
salloc on, not on the first node of the allocation. But
mpirun will do the right thing.
salloc -n12 bash mpirun /home/cen1001/src/mpi_hello
You can even use
srun to launch MPI jobs interactively without
mpirun‘s intervention. The
--mpi option here is to tell
srun which method the MPI library uses for launching tasks. This is the correct one for use with our OpenMPI installations.
srun --mpi=pmi2 -n13 ./mpi_hello
Since Nestum cluster is homogeneous Intel Xeon based parallel machine where each node has 32 compute cores with shared memory. The users can run parallel shared memory (OpenMP) jobs specifying one task
-n 1 with maximum 32 cores per task
#!/bin/bash ... #SBATCH -n 1 #SBATCH -c NUMBER_OF_CORES_PER_TASK ... export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK srun myApp
Hybrid parallel jobs
If there is a need to run hybrid MPI and OpenMP parallel jobs, again you should specify the number or task’s
-n NUMBER_OF_MPI_PROCESS and number of OpenMP threads per MPI process (Number of cores per task)
#!/bin/bash ... #SBATCH -n NUMBER_OF_MPI_PROCESS #SBATCH -c NUMBER_OF_CORES_PER_TASK ... export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK srun myApp
For example, if you want to run a hybrid parallel job on three computing nodes, utilizing all cores on each node with OpenMP threads, then NUMBER_OF_MPI_PROCESS = 3 and NUMBER_OF_CORES_PER_TASK = 32
#!/bin/bash ... #SBATCH -n 3 #SBATCH -c 32 ... export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK srun myApp
then the total number of threads will be 96, i.e. 3 (three) MPI processes each with 32 OpenMP threads.
Non-MPI Parallel jobs
In a parallel job which doesn’t use MPI you can find out which hosts you have and how many by running
srun -l hostname inside the job script. The
-l option will print the slurm task number next to the assigned hostname for the task, skip it if you want just the list of hostnames.
You can then use
srun inside the job to start individual tasks.