Difference between revisions of "Submitting a job to a queue"

From HPC Guide
Jump to navigation Jump to search
orig>Wiki admin
orig>Wiki admin
Line 8: Line 8:
 
   
 
   
 
PBSPRO main commands
 
PBSPRO main commands
The public queues are: short, inf, hugemem, parallel, pub-interactive, gpu.
 
 
short – better be used to jobs, which do not require a lot of memory, and do not last more than 24 hours
 
 
inf – better be used to long jobs (may last weeks)
 
 
hugemem – best to be used when your job requires relatively a lot of memory
 
 
parallel – best to be used when your job need to use more than one core
 
 
pup-interactive – used for interactive jobs
 
  
 
gpu – this queue’s purpose it to enable running hobs, which required gpu processing
 
gpu – this queue’s purpose it to enable running hobs, which required gpu processing
 
  
 
A good reference can be found in link http://www.pbsworks.com/documentation/support/PBSProUserGuide10.4.pdf
 
A good reference can be found in link http://www.pbsworks.com/documentation/support/PBSProUserGuide10.4.pdf
Line 27: Line 15:
 
Start with one of the below commands:
 
Start with one of the below commands:
  
ssh <username>@power
+
ssh <username>@powerlogin.tau.ac.il
  
ssh power -l <username>
+
ssh powerlogin.tau.ac.il -l <username>
  
 
Create a batch job script, for example, file named script that contains the following lines:
 
Create a batch job script, for example, file named script that contains the following lines:
 
+
<pre>
#!/bin/bash
+
#!/bin/bash
 
 
 
cd executables
 
cd executables
 
 
./a.out
 
./a.out
 +
</pre>
  
 
Send the script to be executed in one of the existing queues, for example, to queue ‘short’:
 
Send the script to be executed in one of the existing queues, for example, to queue ‘short’:
 
+
<pre>
 
qsub -q short script
 
qsub -q short script
 
+
</pre>
 
The number which is returned from this command is the job id that was assigned to the new job:
 
The number which is returned from this command is the job id that was assigned to the new job:
  
6770818.power.tau.ac.il
+
'''6770818.power.tau.ac.il'''
  
 
You can see the status of your executing jobs by executing:
 
You can see the status of your executing jobs by executing:
 
+
<pre>
 
qstat -u <username>
 
qstat -u <username>
 
+
</pre>
 
Which lists all the jobs running or being queued for the specified user.
 
Which lists all the jobs running or being queued for the specified user.
 
 
  
 
Job status may be mainly one of the following:
 
Job status may be mainly one of the following:
Line 61: Line 46:
 
You can see the status of all the executing jobs by executing:
 
You can see the status of all the executing jobs by executing:
  
qstat
+
'''qstat'''
  
 
To see the current available queues and their cputime and memory limits, execute:
 
To see the current available queues and their cputime and memory limits, execute:

Revision as of 09:50, 13 May 2021

Power is a Linux cluster system running CentOS (version 7.3-8). The cluster consists of a single head node (power9), and more than 400 compute nodes (some with 16GB, others with 36GB or even 600GB memory and even more) 16 to 96 cores each. Users belonging to netgroup 'power' can login and run their batch jobs on it.

The Faculty Computer Coordinators can change their netgroup from general to power.

Users’ jobs are executed on the compute nodes (compute-0-0 – compute-0-249) under control of a queuing system (PBSPRO). Users are able to logon to the head node, power, via ssh (where their home directory is mounted from the CC filer, the same as on the other CC servers) and submit their jobs to the batch system.

Power cluster and pbspro queueing system

PBSPRO main commands

gpu – this queue’s purpose it to enable running hobs, which required gpu processing

A good reference can be found in link http://www.pbsworks.com/documentation/support/PBSProUserGuide10.4.pdf

Start with one of the below commands:

ssh <username>@powerlogin.tau.ac.il

ssh powerlogin.tau.ac.il -l <username>

Create a batch job script, for example, file named script that contains the following lines:

#!/bin/bash
cd executables
./a.out

Send the script to be executed in one of the existing queues, for example, to queue ‘short’:

qsub -q short script

The number which is returned from this command is the job id that was assigned to the new job:

6770818.power.tau.ac.il

You can see the status of your executing jobs by executing:

qstat -u <username>

Which lists all the jobs running or being queued for the specified user.

Job status may be mainly one of the following:

Q – queued (waiting for its run) R - running You can see the status of all the executing jobs by executing:

qstat

To see the current available queues and their cputime and memory limits, execute:

qstat –q

To see the status of a specific job, you may run:

qstat -f <job number>

Some of the queues are private, accessible to a predefined group of users, other are public, open to all the users of power. More detailed information on any queue limits may be viewed by:

qmgr -c "list queue queuename" For example:

qmgr -c "list queue short"

Default queue limits are enforced unless specified otherwise (up to max values) on 'qsub' command, using flag ‘-l’ (small ‘L’), according the following format:

qsub -q <queue> -l<attribute=limit,attribute=limit,.. <script>

For example:

qsub -q hugemem -lpmem=2000mb,pvmem=3000mb <script>

qsub -q hugemem -lmem=14gb,pmem=5gb,vmem=20gb,pvmem=20gb <script>

qsub -q gpu -lngpus=1 <script>

qsub -q parallel -lnodes=1:ppn=4 <script>

While:

mem - refers to maximum amount of memory to be allocated

pmem - refers to maximum amount of memory to be allocated per process

vmem - refers to maximum amount of virtual memory to be allocated

pvmem - refers to maximum amount of virtual memory to be allocated per process

nodes - number of required nodes (servers)

ppn - number of required cores (within a node)

ngpus - number of required gpus (exists only for queue gpu)

The standard output and standard error files will be written by default at the end of the execution to files in your home directory: script.o#n and script.e#n (where #n is the job number given to your job by the batch queueing system).

To delete a job, use the qdel command:

qdel <job number>


PBSPRO file parameters The script to be run may have additional commands which are directions to the scheduler, instead of adding parameters to the qsub command line.

Explanations regarding PBS script directives can be found at: https://www.osc.edu/supercomputing/batch-processing-at-osc/pbs-directives-summary

For example, instead of specifying ‘qsub –q hugemem …’, one may add ‘#PBS –q hugemem’ to the script to be executed. Like in the below script, named ‘script.sh’, which can be run using the command: ‘qsub script.sh’

  1. !/bin/bash
  1. PBS -l walltime=1:00:00
  1. PBS -l nodes=1:ppn=4,mem=400mb

./my application


Running matlab example In this example there are 3 files:

myTable.m ⇒ This matlab file calculates something

function [] = myTable()

fprintf('=======================================\n');

fprintf(' a b c d \n');

fprintf('=======================================\n');

while 1

               for j = 1:10
                               a = sin(10*j);
                               b = a*cos(10*j);
                               c = a + b;
                               d = a - b;
                               fprintf('%+6.5f   %+6.5f   %+6.5f   %+6.5f   \n',a,b,c,d);
               end

end

fprintf('=======================================\n');


my_table_script.sh ⇒ This script executes the matlab program. Need just to run qsub with this script

  1. !/bin/bash
  1. PBS -e /tmp/dvory/matlab/output
  1. PBS -o /tmp/dvory/matlab/output
  1. PBS -l mem=5000mb
  1. PBS -q hugemem

hostname

cd /a/home/cc/tree/taucc/staff/dvory/matlab

matlab -nodisplay -r "myTable()"


run_in_loop.sh ⇒ However, one may also generate many jobs with this file

#!/bin/bash

for i in {1..100}

do

       qsub my_table_script.sh

done

Running my job is with the command:

./run_in_loop.sh


Interactive session Interactive sessions (line mode) are enabled, using flag ‘-I’ (a big ‘i'):

qsub -I <command>

(without adding a script name)


Interactive sessions with X window To enable opening an x window (such as matlab window, or math window)

This may be enabled using the commands below:

Login to power.tau.ac.il with ‘X’: ssh -X -l <username> power.tau.ac.il

Then use the qsub command with ‘-X’: qsub -I -X -q <queue>

(without adding a script name) Keep in mind that - running matlab via an X window slows the matlab execution.

For the benefit of matlab, need to allocate more memory than is defined in the default public queues, at least the following memory needs to requested:

qsub -q hugemem –lmem=60gb,pmem=60gb,vmem=60gb,pvmem=60gb -I -X


Parallelism Parallel jobs can be executed in the cluster - using up to 8 cores (=ppn) for a job. For example, jobs compiled with mpich can be submitted with the following command:

qsub -l nodes=2:ppn=8 -q parallel <script-filename>

Multithreaded matlab jobs can be submitted with the following command:

qsub -l nodes=1:ppn=8 -q parallel <matlab-script>

‘-l’ refers to a small ‘L’

Environment modules The Environment Modules package provides for the dynamic modification of a user’s environment via modulefiles.

Typically modulefiles instruct the module command to alter or set shell environment variables such as PATH, MANPATH, etc. Modules are useful in managing different versions of applications. Useful commands:

module avail (⇒ lists the available modules on the system)

module load <module>

e.g.:

module load intel/ifort10 (⇒ loads the appropriate module and enables to use ifort version 10 without specifying the path to its binaries and libraries)

module list (⇒ lists the loaded modules)

module unload intel/ifort10 (⇒ unloads the loaded module)