Skip to main content

Troubleshooting

Common errors and solutions for job submission and cluster usage.

Job Submission Errors

No partition specified

srun: error: Unable to allocate resources: No partition specified or system default partition

Always specify a partition. Run check_my_partitions to find yours.

Invalid account or partition

sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified

Your account and partition combination is incorrect. Run check_my_partitions and make sure both match.

QOS not permitted

sbatch: error: Batch job submission failed: Job violates accounting/QOS policy

The QOS you specified doesn't match your account/partition. Run check_my_partitions to see valid combinations.

Job Failures

Out of Memory (OOM)

sacct -j JOBID -o JobID,JobName,State%20

JobID    JobName               State
-------- -------------------- --------------------
71       my_job        OUT_OF_MEMORY

Your job used more RAM than allocated. Resubmit with a higher --mem or --mem-per-cpu. To estimate needed memory:

sacct -j JOBID --format=JobID,JobName,MaxRSS,Elapsed

Timeout

Job state shows TIMEOUT — your job exceeded the time limit. Resubmit with a longer --time value.

Job stuck in Pending (PD)

Check the reason:

squeue -u username -o "%.18i %.9P %.8j %.8u %.2t %.10M %.6D %R"

Common reasons:

  • Resources — cluster is busy, wait for nodes to free up
  • QOSMaxCpuPerUserLimit — you've hit your CPU quota, wait for running jobs to finish
  • InvalidQOS — wrong QOS, check check_my_partitions
  • ReqNodeNotAvail — requested node is down, remove --nodelist constraint

NFS / Storage Issues

Job hangs or freezes on file operations

May indicate an NFS mount issue. Check if your home directory is accessible:

ls ~

If it hangs, contact HPC support — do not kill the job manually as it may cause further issues.

Disk quota exceeded

bash: cannot create temp file: Disk quota exceeded

Your home directory is full. Move large files to scratch space or contact HPC support for a quota increase.

Module Issues

Module not found

module avail MODULE_NAME

Check the exact module name. Use module spider MODULE_NAME for a broader search including partial matches.

Getting Help

If you can't resolve an issue, contact HPC support at hpc@tauex.tau.ac.il. Include:

  • Your username
  • Job ID
  • The command you ran
  • The full error message