The Nestum cluster use SLURM batch system for controlling user jobs. One good introduction using slurm can be found here. Convinient slurm commands are listed below. A Postfix client, installed on the cluster, will notify you of the status of your jobs via email.
Submit interactive job
In order to have interactive job you can run following command:
srun -p interact.p --pty /bin/bash
Intel Parallel XE Cluster Edition 2017 compilers are available only via interactive jobs
Submit batch job
The batch job’s in slurm are submitted with
sbatch command and they represent simple shell script files with additional parameters passed to
sbatch escaped with #SBATCH. For example lets consider following bash shell script my.job:
#!/bin/bash # #SBATCH -p medium.p # partition (queue) #SBATCH -N 2 # number of nodes #SBATCH -n 64 # number of cores #SBATCH -t 0-2:00 # time (D-HH:MM) #SBATCH -o slurm.%N.%j.out # STDOUT #SBATCH -e slurm.%N.%j.err # STDERR #SBATCH --mail-type=<type> # notification trigger #SBATCH --mail-user=<user> # email address module load openmpi mpirun helloworld.x
Although the command line parameters in above script are self explanatory and well document in
sbatch man page. Lets have a few words for each option
#SBATCH -p medium.p # partition (queue)
set the partition (queue) in which job will be submitted. If this option is omitted default queue is used. Next option
#SBATCH -N 2 # number of nodes
set the number of nodes which will be allocated for the job in our case they are 2 in addition we need to set and total number of compute cores
#SBATCH -n 64 # number of cores
set the number of task’s to be executed. Since the default number of cpus-per-task is 1 and each of requested 2 compute has 32 cores the total number if 64 cores will be allocated. Th execution time is specified with -t option
#SBATCH -t 0-2:00 # time (D-HH:MM)
if this valued is omitted default value is 10 minutes. And finally the standard output and standard error stream can be redirected into files slurm.%N.%j.out and slurm.%N.%j.out respectively where %N reperesent the Node id and %j is a task id.
#SBATCH --mail-type=<type> #SBATCH --mail-user=<user>
these lines set the rules for email notification. Setting these is not compulsory. The can be ALL, BEGIN, END and FAIL, which are self-explanatory. The field should be the desired email address. If left blank, notifications will be send on the email address associated with the user account.
Submitting job is quite simple
sbatch -t 0-3:00 my.job
in above example the command line option -t will override the option in job file.
In order to list all running job’s you can use the command
squeue without any additional arguments:
if you like to get the running job’s of particular user use command switch -c
squeue -u user
In case you need to cancel running job. First obtain job id using
squeue command, then use command
The state of computational nodes in slurm environment can be reviewed using command