Difference between revisions of "New slurm qos usage"

Revision as of 06:41, 23 October 2025

We have chatgpt page, which explains it all in HPC-helper-toolkit

Each partition (or “pool”) now has several QoS tiers that determine job priority and preemption behavior.

QOS types for each pool
QOS	Purpose	Preempts	Can be preempted by
Share-type QoS (e.g. 0.125_48c_8g, 0.75_48c_8g)	For multi-owner pools; defines each owner’s guaranteed slice (CPU/GPU portion).	owner,public	--
owner	Used on your lab’s pool to run above your guaranteed slice (higher than public).	public	share-type QoS
public (partition: power-general-shared-pool)	Used on cluster-wide shared pools for friendly or opportunistic runs	--	owner, share-type QoS
public (partition: power-general-public-pool)	Used on cluster-wide shared, little group of nodes, not preemptable	--	--

Preemption rule summary: share-type QoS > owner > public

This means:

• A share-type QoS job can preempt owner or public jobs on the same pool.

• An owner job can preempt public jobs.

• Public jobs cannot preempt any other jobs.

Below are examples of how to use the new QoS tiers with your account: Owner QoS (on your lab’s pool)

sbatch -A UIDHERE-users_v2 -p UIDHERE-pool --qos=owner --time=02:00:00 run.sh

Share-type QoS (on a multi-owner pool, for your guaranteed slice)

sbatch -A UIDHERE-users_v2 -p gpu-dudu-tzach-yoav-pool --qos=0.125_48c_8g --gres=gpu:A100:1 run.sh

Public QoS (friendly, cluster-wide)

sbatch -A UIDHERE-users_v2 -p power-general-shared-pool --qos=public --time=01:00:00 run.sh

For the small, protected CPU pool

sbatch -A UIDHERE-users_v2 -p power-general-public-pool --qos=public --time=01:00:00 run.sh

You can monitor your jobs and see their QoS and reasons:

squeue --me -O "JOBID,ACCOUNT,PARTITION,QOS,STATE,REASON"
sprio -w

If your job was preempted, check:

sacct -j <jobid> --format=JobID,State,Reason

Revision as of 05:49, 23 October 2025 (view source) Dvory (talk \| contribs) (Created page with "Each partition (or “pool”) now has several QoS tiers that determine job priority and preemption behavior. {\| class="wikitable" \|+ QOS types for each pool \|- ! QOS !! Purp...")		Revision as of 06:41, 23 October 2025 (view source) Dvory (talk \| contribs) Newer edit →
Line 1:		Line 1:
		+	We have chatgpt page, which explains it all in [https://chatgpt.com/g/g-68be7f9acfb88191978615c1693e2cff-hpc-helper-toolkit HPC-helper-toolkit]
		+
	Each partition (or “pool”) now has several QoS tiers that determine job priority and preemption behavior.		Each partition (or “pool”) now has several QoS tiers that determine job priority and preemption behavior.