Troubleshooting
Common errors and solutions for job submission and cluster usage.
Job Submission Errors
No partition specified
srun: error: Unable to allocate resources: No partition specified or system default partition
Always specify a partition. Run check_my_partitions to find yours.
Invalid account or partition
sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified
Your account and partition combination is incorrect. Run check_my_partitions and make sure both match.
QOS not permitted
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy
The QOS you specified doesn't match your account/partition. Run check_my_partitions to see valid combinations.
Job Failures
Out of Memory (OOM)
sacct -j JOBID -o JobID,JobName,State%20
JobID JobName State
-------- -------------------- --------------------
71 my_job OUT_OF_MEMORY
Your job used more RAM than allocated. Resubmit with a higher --mem or --mem-per-cpu. To estimate needed memory:
sacct -j JOBID --format=JobID,JobName,MaxRSS,Elapsed
Timeout
Job state shows TIMEOUT — your job exceeded the time limit. Resubmit with a longer --time value.
Job stuck in Pending (PD)
Check the reason:
squeue -u username -o "%.18i %.9P %.8j %.8u %.2t %.10M %.6D %R"
Common reasons:
- Resources — cluster is busy, wait for nodes to free up
- QOSMaxCpuPerUserLimit — you've hit your CPU quota, wait for running jobs to finish
- InvalidQOS — wrong QOS, check
check_my_partitions - ReqNodeNotAvail — requested node is down, remove
--nodelistconstraint
NFS / Storage Issues
Job hangs or freezes on file operations
May indicate an NFS mount issue. Check if your home directory is accessible:
ls ~
If it hangs, contact HPC support — do not kill the job manually as it may cause further issues.
Disk quota exceeded
bash: cannot create temp file: Disk quota exceeded
Your home directory is full. Move large files to scratch space or contact HPC support for a quota increase.
Module Issues
Module not found
module avail MODULE_NAME
Check the exact module name. Use module spider MODULE_NAME for a broader search including partial matches.
Getting Help
If you can't resolve an issue, contact HPC support at hpc@tauex.tau.ac.il. Include:
- Your username
- Job ID
- The command you ran
- The full error message