# Running Jobs

How to submit, monitor, and manage jobs on the TAU HPC cluster.

# Slurm Basics

Core Slurm concepts — batch jobs, interactive sessions, monitoring, and memory.

# Submitting a Batch Job

Use `sbatch` to submit a script for batch processing. Your script runs when resources become available — you don't need to stay connected.

## Basic Job Script

```bash
#!/bin/bash
#SBATCH --job-name=my_job                        # Job name
#SBATCH --account=public-users_v2                # Account name
#SBATCH --partition=power-general-shared-pool    # Partition name
#SBATCH --qos=public                             # QOS type
#SBATCH --time=02:00:00                          # Max run time (hh:mm:ss)
#SBATCH --ntasks=1                               # Number of tasks
#SBATCH --nodes=1                                # Number of nodes
#SBATCH --cpus-per-task=10                       # CPU cores required
#SBATCH --mem-per-cpu=4G                         # Memory per CPU
#SBATCH --output=my_job_%j.out                   # Output file (%j = job ID)
#SBATCH --error=my_job_%j.err                    # Error file
#SBATCH --mail-user=your@email.tau.ac.il         # Email address
#SBATCH --mail-type=END,FAIL                     # Notify on completion or failure

module purge
module load gcc/gcc-12.1.0

# Your commands here
./my_program
```

Submit it:

```
sbatch my_job.sh
```

## Job Arrays

If you need to submit many similar jobs, use a job array instead of separate `sbatch` calls. Arrays reduce scheduler load and are easier to manage.

```bash
#!/bin/bash
#SBATCH --job-name=array_job
#SBATCH --account=public-users_v2
#SBATCH --partition=power-general-shared-pool
#SBATCH --qos=public
#SBATCH --time=02:00:00
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=4G
#SBATCH --array=1-100                            # 100 tasks
#SBATCH --output=array_job_%A_%a.out             # %A = job ID, %a = task ID
#SBATCH --error=array_job_%A_%a.err

# Use SLURM_ARRAY_TASK_ID to differentiate tasks
./my_program input_${SLURM_ARRAY_TASK_ID}.txt
```

Each task runs independently with a different `SLURM_ARRAY_TASK_ID`. Output and error logs are separated per task.

## Excluding Nodes

To exclude specific nodes from your job:

```bash
#SBATCH --exclude=compute-0-[100-103],compute-0-67
```

## Chain Jobs

Use `--depend` to run a job only after another completes:

```bash
sbatch --depend=afterok:45001 next_step.sh
```

# Interactive Jobs

Interactive jobs give you a shell directly on a compute node — useful for testing, debugging, or running applications that need direct input.

## Basic Interactive Session

```bash
srun --ntasks=1 -p power-general-shared-pool -A public-users_v2 --qos=public --pty bash
```

## On a Specific Node

```bash
srun --ntasks=1 -p power-general-shared-pool -A public-users_v2 --qos=public --nodelist=compute-0-12 --pty bash
```

## With GUI (X11 Forwarding)

Make sure you connected to the login node with `ssh -X` first:

```bash
srun --ntasks=1 -p power-general-shared-pool -A public-users_v2 --qos=public --x11 --pty bash
```

## With Multiple CPUs

```bash
srun --ntasks=1 --cpus-per-task=4 -p power-general-shared-pool -A public-users_v2 --qos=public --nodes=1 --pty bash
```

## Important

- Interactive jobs consume resources for as long as your session is open — exit when done
- If your connection drops, the interactive job will terminate
- For long-running work, use `sbatch` instead

# Managing & Monitoring Jobs

Once your job is submitted, use these commands to track and manage it.

## Checking Job Status

```bash
# Your jobs only
squeue -u your_username

# All jobs in the cluster
squeue

# Specific job
squeue -j job_id
```

## Job States

<table id="bkmrk-state-meaning-pd-pen"><thead><tr><th>State</th><th>Meaning</th></tr></thead><tbody><tr><td>PD</td><td>Pending — waiting for resources</td></tr><tr><td>R</td><td>Running</td></tr><tr><td>CG</td><td>Completing</td></tr><tr><td>CD</td><td>Completed</td></tr><tr><td>F</td><td>Failed</td></tr><tr><td>OOM</td><td>Out of memory</td></tr><tr><td>TO</td><td>Timed out</td></tr></tbody></table>

## Job Details

```bash
# Full details of a job
scontrol show job job_id

# Job accounting and resource usage
sacct -j job_id --format=JobID,JobName,State,MaxRSS,Elapsed
```

## Cancelling Jobs

```bash
# Cancel a specific job
scancel job_id

# Cancel all your jobs
scancel -u your_username

# Cancel a specific array task
scancel job_id_task_id
```

## Attaching to a Running Job

Use `sattach` to attach to a running job's input/output streams in real time:

```bash
# Attach to a job
sattach job_id

# Attach to a specific job step
sattach job_id.step_id
```

For non-interactive jobs this behaves like `tail -f`. For interactive jobs you can provide input directly.

To find job steps:

```bash
scontrol show job job_id
```

Look for **StepId** in the output.

# Memory & Resource Estimation

Specifying the right amount of memory is important. Too little and your job fails with OOM. Too much and you block resources from other users.

## Memory Directives

```bash
#SBATCH --mem=4G           # Total memory for the entire job
#SBATCH --mem-per-cpu=2G   # Memory per CPU core (use one or the other, not both)
```

Prefer `--mem-per-cpu` when your job scales with CPU count. Use `--mem` for fixed memory requirements.

## Checking Memory Usage of Past Jobs

```bash
sacct -j job_id --format=JobID,JobName,MaxRSS,Elapsed
```

`MaxRSS` is the peak memory used. Use this to tune future submissions.

## Monitoring a Running Job

```bash
#!/bin/bash
#SBATCH --job-name=memory_test
#SBATCH --account=public-users_v2
#SBATCH --partition=power-general-shared-pool
#SBATCH --qos=public
#SBATCH --time=01:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G
#SBATCH --output=memory_test.out

echo "Memory before:"
free -m

./your_application

echo "Memory after:"
free -m
```

## Tips for Estimating Memory

- Start conservative, check `MaxRSS` via `sacct`, tune upward
- Check application documentation for memory recommendations
- Run a small test job first before scaling up
- Use `free -m`, `top`, or `htop` inside an interactive job to observe live usage
- Plan for peak usage — memory spikes during data loading or processing

## OOM Error

If your job fails with out-of-memory, you'll see:

```bash
sacct -j job_id -o JobID,JobName,State%20

JobID    JobName               State
-------- -------------------- --------------------
71       my_job        OUT_OF_MEMORY
```

Resubmit with a higher `--mem` or `--mem-per-cpu` value.

# GPUs, Constraints & Features

Beyond standard CPU/memory resources, the cluster provides special resources you can request via `--gres` and `--constraint`.

## GPU Jobs

To request a GPU, use the `gpu-general-pool` partition and add `--gres=gpu:1`:

```bash
#!/bin/bash
#SBATCH --job-name=gpu_job
#SBATCH --account=my_account
#SBATCH --partition=gpu-general-pool
#SBATCH --qos=my_qos
#SBATCH --time=02:00:00
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --cpus-per-task=10
#SBATCH --gres=gpu:1
#SBATCH --mem-per-cpu=4G
#SBATCH --output=gpu_job_%j.out
#SBATCH --error=gpu_job_%j.err

module purge
module load python/python-3.8

# Your GPU commands here
```

## Constraints

Constraints let you target nodes with specific hardware. Add `--constraint=<feature>` to your job script:

```bash
#SBATCH --constraint=localscratch,amd
```

Multiple constraints are comma-separated — the job will only run on nodes matching all of them.

## Available Features

Run `features` on the login node to see the current list. Current features:

<table id="bkmrk-feature-description-"><thead><tr><th>Feature</th><th>Description</th></tr></thead><tbody><tr><td>`Af3`</td><td>Nodes that can run AlphaFold3</td></tr><tr><td>`AMD`</td><td>Nodes with AMD CPU family</td></tr><tr><td>`avx`</td><td>Nodes with AVX CPU capabilities</td></tr><tr><td>`localscratch`</td><td>Nodes with at least 700GB local `/localscratch`</td></tr><tr><td>`ib_bh`</td><td>Nodes with InfiniBand network #1</td></tr><tr><td>`ib_dj`</td><td>Nodes with InfiniBand network #2</td></tr><tr><td>`Intel`</td><td>Nodes with Intel CPU family</td></tr></tbody></table>

## Using Local Scratch

If your job uses `/localscratch`, you must clean up after yourself — space is shared across all jobs on that node. Add this to your script:

```bash
export CACHEDIR=/localscratch/${USER}_${SLURM_JOB_ID}
mkdir -p $CACHEDIR

cleanup() {
  rm -rf -- "$CACHEDIR" || true
}
trap cleanup EXIT INT TERM HUP
```

The `trap` ensures cleanup runs even if the job fails or is cancelled.

## Available GRES Types

```bash
GresTypes=gpu,amd,af3,intel,localscratch
```