Difference between revisions of "New slurm qos usage"
| Line 1: | Line 1: | ||
'''We have chatgpt page, which explains it all in [https://chatgpt.com/g/g-68be7f9acfb88191978615c1693e2cff-hpc-helper-toolkit HPC-helper-toolkit]''' | '''We have chatgpt page, which explains it all in [https://chatgpt.com/g/g-68be7f9acfb88191978615c1693e2cff-hpc-helper-toolkit HPC-helper-toolkit]''' | ||
| + | |||
| + | Please use the script '''check_my_partitions''' - to find out what are your partitions, accounts and qos | ||
| + | |||
==QOS== | ==QOS== | ||
Each partition (or “pool”) now has several QoS tiers that determine job priority and preemption behavior. | Each partition (or “pool”) now has several QoS tiers that determine job priority and preemption behavior. | ||
Latest revision as of 14:41, 4 December 2025
We have chatgpt page, which explains it all in HPC-helper-toolkit
Please use the script check_my_partitions - to find out what are your partitions, accounts and qos
QOS
Each partition (or “pool”) now has several QoS tiers that determine job priority and preemption behavior.
| QOS | Purpose | Preempts | Can be preempted by |
|---|---|---|---|
| Share-type QoS (e.g. 0.125_48c_8g, 0.75_48c_8g) | For multi-owner pools; defines each owner’s guaranteed slice (CPU/GPU portion). | owner,public | -- |
| owner | Used on your lab’s pool to run above your guaranteed slice (higher than public). | public | share-type QoS |
| public (partition: power-general-shared-pool) | Used on cluster-wide shared pools for friendly or opportunistic runs | -- | owner, share-type QoS |
| public (partition: power-general-public-pool) | Used on cluster-wide shared, little group of nodes, not preemptable | -- | -- |
Billing
Each user has a "billing" parameter, which is extracted from the amount of resources that he/she asks.
Therefore, when you ask for more memory, your billing is increasing. Eventually the billing parameter affects priority, so users who asked for less resources in the past will have more priority in the future.
Preemption rule summary
share-type QoS > owner > public
This means:
• A share-type QoS job can preempt owner or public jobs on the same pool.
• An owner job can preempt public jobs.
• Public jobs cannot preempt any other jobs.
How to Submit Jobs with the Correct QoS
Below are examples of how to use the new QoS tiers with your account: Owner QoS (on your lab’s pool)
sbatch -A UIDHERE-users_v2 -p UIDHERE-pool --qos=owner --time=02:00:00 run.sh
Share-type QoS (on a multi-owner pool, for your guaranteed slice)
sbatch -A UIDHERE-users_v2 -p gpu-dudu-tzach-yoav-pool --qos=0.125_48c_8g --gres=gpu:A100:1 run.sh
Public QoS (friendly, cluster-wide)
sbatch -A UIDHERE-users_v2 -p power-general-shared-pool --qos=public --time=01:00:00 run.sh
For the small, protected CPU pool
sbatch -A UIDHERE-users_v2 -p power-general-public-pool --qos=public --time=01:00:00 run.sh
Handy Checks During Usage
You can monitor your jobs and see their QoS and reasons:
squeue --me -O "JOBID,ACCOUNT,PARTITION,QOS,STATE,REASON" sprio -w
If your job was preempted, check:
sacct -j <jobid> --format=JobID,State,Reason