New slurm qos usage
Jump to navigation
Jump to search
Each partition (or “pool”) now has several QoS tiers that determine job priority and preemption behavior.
| QOS | Purpose | Preempts | Can be preempted by |
|---|---|---|---|
| Share-type QoS (e.g. 0.125_48c_8g, 0.75_48c_8g) | For multi-owner pools; defines each owner’s guaranteed slice (CPU/GPU portion). | owner,public | -- |
| owner | Used on your lab’s pool to run above your guaranteed slice (higher than public). | public | share-type QoS |
| public (partition: power-general-shared-pool) | Used on cluster-wide shared pools for friendly or opportunistic runs | -- | owner, share-type QoS |
| public (partition: power-general-public-pool) | Used on cluster-wide shared, little group of nodes, not preemptable | -- | -- |
Preemption rule summary: share-type QoS > owner > public
This means:
• A share-type QoS job can preempt owner or public jobs on the same pool.
• An owner job can preempt public jobs.
• Public jobs cannot preempt any other jobs.
How to Submit Jobs with the Correct QoS
Below are examples of how to use the new QoS tiers with your account: Owner QoS (on your lab’s pool)
sbatch -A UIDHERE-users_v2 -p UIDHERE-pool --qos=owner --time=02:00:00 run.sh
Share-type QoS (on a multi-owner pool, for your guaranteed slice)
sbatch -A UIDHERE-users_v2 -p gpu-dudu-tzach-yoav-pool --qos=0.125_48c_8g --gres=gpu:A100:1 run.sh
Public QoS (friendly, cluster-wide)
sbatch -A UIDHERE-users_v2 -p power-general-shared-pool --qos=public --time=01:00:00 run.sh
For the small, protected CPU pool
sbatch -A UIDHERE-users_v2 -p power-general-public-pool --qos=public --time=01:00:00 run.sh
Handy Checks During Usage
You can monitor your jobs and see their QoS and reasons:
squeue --me -O "JOBID,ACCOUNT,PARTITION,QOS,STATE,REASON" sprio -w
If your job was preempted, check:
sacct -j <jobid> --format=JobID,State,Reason