Running programs on Heracles by using SLURM

Home

Heracles Cluster Machine has 16 nodes (node 2 to node 17) plus one master node (node 1) plus one GPU node (Node18 has four GPUs). User programs MUST NOT be run on master node but on compute nodes instead.   

Before you run your programs, please be sure to check the cluster status to select a non-busy compute node. Running programs on busy compute nodes may result in incorrect performance benchmark.

SLURM
It is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.

Visit this link to learn more about SLURM:

https://slurm.schedmd.com/quickstart.html

Commands


       srun is used to submit a job for execution or initiate job steps in real time

Syntax:

srun -n< # tasks to run> -w "node[node-list]" <path/>myProgram

where,

-n<# tasks to run>: define the number of tasks will be run

-w "node[node-list]":  define a specific list of nodes where the application will run


Example:

srun -n1 -w "node[10]" ./cge 2 1

The command executes one instance of cge program on node 10

srun -n2 -w "node[10,11]" ./cge 2 1

The command executes two instances of cge program. One instance will be executed on node 10 and another one on node 11

sbatch is used to submit a job script for later execution. The script will typically contain one or more srun commands to launch parallel tasks.

Learn more about sbatch visiting: https://slurm.schedmd.com/sbatch.html

Example of script

Choose a text editor to create your script. For this you may use vi or nano text editors available on Heracles.

nano -c myFirst.script

#!/bin/sh
#SBATCH --job-name=MyJob                   # Job name
#SBATCH --begin=<time>                         # Time may be of the form HH:MM:SS to run a job at a specific time of day
#SBATCH --mail-type=ALL                      # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=<email_address>   # Where to send mail   
#SBATCH --nodes=1                                 # number of nodes to be allocated for the job
#SBATCH --nodelist=node[2-5,6,7,n]        # Request a specific list of nodes 
#SBATCH --cpus-per-task=4                      # Number of CPU cores per task - use only for parallel programs
#SBATCH --output=Myjob.out                  # Standard output and error log
srun ./Myprogram <arg1> <argn>

To submit the job, executet sbatch command:

sbatch myFirst.script

Other commands: