Submitting a job to a slurm queue

From HPC Guide
Jump to navigation Jump to search

SLURM (Simple Linux Utility for Resource Management) is a job scheduler used on many high-performance computing systems. It manages and allocates resources such as compute nodes and controls job execution.

Accessing the System

To submit jobs to the SLURM scheduler at Tel Aviv University, you must access the system through one of the designated login nodes. These nodes act as the gateway for submitting and managing your SLURM jobs. The available login nodes are:

  • powerslurm-login.tau.ac.il
  • powerslurm-login2.tau.ac.il

Login Requirements:

  1. Membership in the "power" group: Ensure you are a part of the "power" group which grants the necessary permissions for accessing the HPC resources.
  2. University Credentials: Log in using your Tel Aviv University credentials. This ensures secure access and that your job submissions are appropriately accounted for under your user profile.

Remember, these login nodes are the initial point of contact for all your job management tasks, including job submission, monitoring, and other SLURM-related operations.

Basic Job Submission Commands

Finding your account and partition

In order to submit jobs to slurm, one needs to know the accounts and partitions she belongs to. Each account may belong to one or more partitions.

To see the account I belong to, please type:

sacctmgr show associations where user=dvory format=Account%20

If you know your partition, and would like to know which account you need to specify when using it, please do (on powerslurm-login)

check_allowed_account -p <partition>

For example:

check_allowed_account -p power-general

Example for submitting jobs

  1. sbatch: Submit a batch job script.
    • Example: sbatch --ntasks=1 --time=10 -p power-general -A power-general-users pre_process.bash
    • This submits pre_process.bash with 1 task for 10 minutes.
    • Example of chaining jobs: sbatch --ntasks=128 --time=60 -p power-general -A power-general-users --depend=45001 do_work.bash
    • Example with GPU: sbatch --ntasks=1 --time=10 --gres=gpu:2 -p gpu-general -A gpu-general-users pre_process.bash or sbatch --gres=gpu:1 -p gpu-general -A gpu-general-users gpu_job.sh
  2. salloc: Allocate resources for an interactive job but doesn't start it immediately.
    • Example: salloc --ntasks=8 --time=10 -p power-general -A power-general-users bash
  3. srun: Submit an interactive job with MPI (Message Passing Interface), often called a "job step."
    • Example: srun --ntasks=2 -p power-general -A power-general-users --label hostname
    • With MPI: srun -intasks=2 -p power-general -A power-general-users--label hostname
  4. sattach: Attach stdin/out/err to an existing job or job step.

Interactive Job Examples

  • Opening a bash shell: srun --ntasks=56 -p power-general -A power-general-users --pty bash
  • Specifying compute nodes: srun --ntasks=56 -p power-general -A power-general-users --nodelist="compute-0-12" --pty bash
  • Using a GPU: salloc --ntasks=8 --time=10 --gres=gpu:4 -p gpu-general -A gpu-general-users bash

Script Examples:

#!/bin/bash

#SBATCH --job-name=my_job             # Job name
#SBATCH --account=power-general-users # Account name for billing
#SBATCH --partition=power-general     # Partition name
#SBATCH --time=02:00:00               # Time allotted for the job (hh:mm:ss)
#SBATCH --ntasks=4                    # Number of tasks (processes)
#SBATCH --cpus-per-task=1             # Number of CPU cores per task
#SBATCH --mem-per-cpu=4G              # Memory per CPU core
#SBATCH --output=my_job_%j.out        # Standard output and error log (%j expands to jobId)
#SBATCH --error=my_job_%j.err         # Separate file for standard error

# Load modules or software if required
# module load python/3.8

# Print some information about the job
echo "Starting my SLURM job"
echo "Job ID: $SLURM_JOB_ID"
echo "Running on nodes: $SLURM_JOB_NODELIST"
echo "Allocated CPUs: $SLURM_JOB_CPUS_PER_NODE"

# Run your application, this could be anything from a custom script to standard applications
# ./my_program
# python my_script.py

# End of script
echo "Job completed"

Script example with GPU

#!/bin/bash

#SBATCH --job-name=my_job             # Job name
#SBATCH --account=my_account          # Account name for billing
#SBATCH --partition=long              # Partition name
#SBATCH --time=02:00:00               # Time allotted for the job (hh:mm:ss)
#SBATCH --ntasks=4                    # Number of tasks (processes)
#SBATCH --cpus-per-task=1             # Number of CPU cores per task
#SBATCH --gres=gpu:NUMBER_OF_GPUS     # number of GPU's to use in the job
#SBATCH --mem-per-cpu=4G              # Memory per CPU core
#SBATCH --output=my_job_%j.out        # Standard output and error log (%j expands to jobId)
#SBATCH --error=my_job_%j.err         # Separate file for standard error

# Load modules or software if required
module load python/3.8

# Print some information about the job
echo "Starting my SLURM job"
echo "Job ID: $SLURM_JOB_ID"
echo "Running on nodes: $SLURM_JOB_NODELIST"
echo "Allocated CPUs: $SLURM_JOB_CPUS_PER_NODE"

# Run your application, this could be anything from a custom script to standard applications
# ./my_program
# python my_script.py

# End of script
echo "Job completed"

Error Handling

  • On some clusters, specifying resources is necessary. Without them, the job may fail.
    • Example error: srun: error: Unable to allocate resources: No partition specified or system default partition
    • Correct usage: srun --pty -c 1 --mem=2G -p power-yoren /bin/bash
  • Be aware that specifying GPU resources is crucial for jobs

SLURM Information Commands

  • sinfo: View all queues (partitions).
  • squeue: View all jobs.
  • scontrol show partition: View all partitions.
  • scontrol show job <job_number>: View a job's attributes.

Tips for Managing SLURM Jobs

  • Chain jobs by using the --depend flag in sbatch.
  • Use salloc for interactive jobs that require specific resources for a limited time.
  • srun is versatile for both interactive and batch jobs, especially with MPI.
  • Always specify necessary resources in clusters where defaults are not set.