PowerIDE User Guide

From HPC Guide
Jump to navigation Jump to search

PowerIDE User Guide

PowerIDE provides interactive access to the HPC cluster through a web browser. You can run Jupyter notebooks and VS Code directly on compute nodes without needing SSH access.

---

Getting Started

1. Access PowerIDE

Open your web browser and navigate to:

https://poweride.tau.ac.il/jupyter

2. Login

Log in with your TAU university credentials:

  • Username: Your TAU username
  • Password: Your TAU password

This is the same login you use for email and other university services.

---

Starting Your Server

After logging in, you'll see a page with a large orange button:

Click "Start My Server"

You will then be presented with a Server Options form where you configure your compute resources.

How It Works

When you start your server, PowerIDE submits a Slurm job to the PowerSlurm cluster. Your Jupyter session runs on a compute node, not on the PowerIDE server itself.

This means:

  • You get dedicated resources (CPUs, memory, GPUs) on a compute node
  • Your job runs through the same Slurm scheduler as other HPC jobs
  • The PowerIDE server is only the web interface - all computation happens on cluster nodes
  • Your session will queue if the cluster is busy (just like regular batch jobs)

---

Configuring Resources

File:Server Options Form.png
Server Options form showing resource selection

The form includes the following fields:

Partition

Select which partition (queue) to run on. The dropdown will only show partitions you have access to based on your Slurm account permissions.

Common partitions:

  • `power-general-shared-pool` - General purpose computing
  • `gpu-general-pool` - GPU-enabled partition (if available)
  • Check with your PI or HPC admin for which partitions you should use

QOS (Quality of Service)

Select the QOS for your job. This controls priority and resource limits.

  • Default (owner) - Usually the best choice (uses your group's default QOS)
  • Other options may be available based on your partition selection

The form will automatically show only valid QOS options for your selected partition.

GPUs

If you select a GPU-enabled partition, a GPUs field will appear. Specify how many GPUs you need (0 if none).

The maximum number of GPUs is automatically limited based on the partition's capabilities.

Time (D-HH:MM:SS)

Specify how long your session should run. Default is `04:00:00` (4 hours).

Formats accepted:

  • `HH:MM:SS` - e.g., `02:30:00` for 2.5 hours
  • `D-HH:MM:SS` - e.g., `1-12:00:00` for 1 day and 12 hours

Important: Your session will be terminated when time runs out. Save your work regularly!

CPUs per task

Number of CPU cores for your session. Default is `1`.

Increase this if you're running multi-threaded code.

Memory

Amount of RAM to allocate. Default is `1G`.

Examples:

  • `2G` - 2 gigabytes
  • `8G` - 8 gigabytes
  • `500M` - 500 megabytes

Tip: Start with less and increase if needed. Over-requesting resources may delay job start.

Working directory

Default: Your LDAP home directory (e.g., `/a/home/cc/staff/yourusername`)

This is where your Jupyter session starts.

Recommendation: If you're working on a specific project located elsewhere, change this to your project directory. For example:

  • `/a/home/cc/students/yourgroup/project1`
  • `/scratch/yourusername/analysis`

This saves time navigating to your files after launch.

Stdout directory

Where to write job output logs. Default: your home directory.

Recommendation: Usually fine to leave as default, but you can change it to organize logs better (e.g., `~/logs/` or your project directory).

Stderr directory

Where to write job error logs. Default: your home directory.

Recommendation: Same as stdout - usually fine to keep default.

---

Starting Your Session

After filling out the form, click the orange Start button at the bottom.

What happens next:

  1. PowerIDE submits a Slurm job with your requested resources
  2. You'll see a progress page saying "Your server is starting up..."
  3. Wait for a compute node to become available (usually 10-60 seconds)
  4. Once started, you'll automatically be redirected to JupyterLab

Note: If the cluster is busy, it may take longer. You can close the browser and come back - your session will start when resources are available.

---

Using JupyterLab

Once your server starts, you'll land in JupyterLab - a web-based development environment.

JupyterLab Interface

  • Left sidebar: File browser, running kernels, extensions
  • Main area: Notebooks, text files, terminals
  • Launcher: Click the + button to see available tools

Common Tasks

Create a new notebook:

  1. Click the + button (or File → New Launcher)
  2. Click on a kernel (e.g., "Python 3")
  3. Start coding!

Open a terminal:

  1. Click the + button
  2. Click "Terminal" in the launcher
  3. You now have a bash shell on the compute node

Upload files:

  • Drag and drop files into the file browser, OR
  • Click the upload button (↑ icon) in the file browser

Download files:

  • Right-click file → Download

---

Using VS Code

PowerIDE includes VS Code (Visual Studio Code) running in your browser!

Starting VS Code

  1. From JupyterLab, click the + button to open the launcher
  2. Look for the VS Code icon in the launcher
  3. Click it - VS Code will open in a new tab/window

You now have a full VS Code environment running on the compute node with all your files accessible.

VS Code Features

  • Full code editor with syntax highlighting
  • Integrated terminal
  • Extensions support
  • Git integration
  • File explorer

Tip: VS Code runs in the same job as JupyterLab, so it has access to all the same resources (CPUs, memory, GPUs) you requested.

---

Python Environments

What is a Python Kernel?

A kernel is simply a Python interpreter that JupyterLab uses to run your code. When you create a notebook and select "Python 3.12 (Base)", you're choosing which Python environment to use.

Think of it like choosing which Python installation to run: `/usr/bin/python3` vs `/path/to/my-env/bin/python`

Default Kernel

PowerIDE provides one default kernel:

  • Python 3.12 (Base) - Standard Python with JupyterLab and common packages

Creating Your Own Kernels

You can register your own conda/mamba environments as kernels! This lets you:

  • Use different Python versions (3.9, 3.10, 3.11, etc.)
  • Install custom packages without affecting others
  • Have multiple project-specific environments

Steps to register your own environment:

  1. Create your conda/mamba environment (wherever you normally keep them)
  2. Activate it and make sure `ipykernel` is installed
  3. Register it as a kernel
  4. Refresh your browser - it will appear in the JupyterLab launcher!

Example:

# From a JupyterLab terminal (or any cluster node):
module load mamba/mamba-2.1.1
mamba create -n my-project python=3.11 pandas matplotlib
mamba activate my-project
mamba install ipykernel

# Register as kernel (--user means only you will see it)
python -m ipykernel install --user --name my-project --display-name "My Project (Python 3.11)"

# Done! Refresh your browser and look for "My Project (Python 3.11)" in the launcher

Need help?

If you need assistance:

  • Installing `ipykernel` in your environment
  • Registering your kernel
  • Troubleshooting kernel issues

Contact us at hpc@tauex.tau.ac.il - we're happy to help!

Important notes:

  • Kernels are just small config files (~1 KB) - they don't use your disk quota
  • Each user only sees their own kernels (plus system defaults)
  • You can have as many kernels as you want
  • Remove a kernel: `jupyter kernelspec uninstall kernel-name`

---

Stopping Your Server

Important: Always stop your server when you're done to free up resources for others!

There are two ways to stop:

Method 1: From JupyterLab

  1. Go to File → Hub Control Panel
  2. Click the red Stop My Server button

Method 2: From PowerIDE home

  1. Navigate to https://poweride.tau.ac.il/jupyter/hub/home
  2. Click the red Stop My Server button

Your job will be terminated and the compute node will be freed.

---

Best Practices

Resource Allocation

  • Start small: Request fewer resources initially. You can always restart with more.
  • Be realistic: Only request what you actually need
  • Time limits: Set a reasonable time limit. You can always restart if you need more time.
  • GPU usage: Only request GPUs if your code actually uses them

File Management

  • Working directory: Set it to your project folder to save navigation time
  • Save frequently: Your session will end when time runs out
  • Large files: Store large datasets in scratch space, not your home directory

Data and Code

  • Home directory: Your LDAP home directory - personal files, small projects
  • Scratch space: Large temporary datasets
  • Project directories: Shared group work (varies by group)

Tip: Use Git to version control your code, not for large data files.

---

Troubleshooting

My server won't start

Possible reasons:

  • Cluster is full: Wait a few minutes and try again
  • Invalid partition: Make sure you selected a partition you have access to
  • Too many resources: Try requesting fewer CPUs/memory
  • No QOS access: You may not have any QOS configured for your account

What to do:

  1. Wait 2-3 minutes
  2. If still pending, go to Hub Control Panel and click "Stop My Server"
  3. Try again with fewer resources or different partition
  4. If QOS dropdown is empty, contact HPC support - you may need Slurm associations configured

I see "404: Not Found"

This usually means your job didn't start successfully.

Check:

  1. Go to your home directory on a login node (or check via terminal)
  2. Look for files named `jupyterhub-JOBID.err` (where JOBID is a number)
  3. Check the file for error messages
  4. Contact HPC support if you can't resolve it

VS Code icon doesn't appear

This is rare - if it happens:

  1. Try refreshing your browser
  2. If still missing, contact HPC support

My session was killed

Common reasons:

  • Time limit reached: Your session ran for the full time you requested
  • Out of memory: Your code used more RAM than allocated
  • Node failure: Rare, but compute nodes can crash

Solution:

  • Save your work frequently
  • Request more time/memory next time
  • Check `jupyterhub-JOBID.err` file for clues

---

Getting Help

Support

For technical issues:

  • Email: `hpc@tauex.tau.ac.il`

When asking for help, include:

  • Your username
  • What you were trying to do
  • Error messages (copy/paste or screenshot)
  • Job ID if available (from error file name)

Email: `hpc@tauex.tau.ac.il`

---

FAQ

Q: Can I run multiple servers at once?

A: No, you can only have one server running at a time per user.

Q: How long can my session run?

A: Varies by partition - most partitions allow up to 7 days maximum.

Q: Can I install Python packages?

A: Yes! Create your own conda/mamba environment, install whatever packages you need, and register it as a kernel (see "Python Environments" section). You have full control over your own environments.

Q: Why is my QOS dropdown empty?

A: This means you don't have any QOS associations configured in Slurm. Contact HPC support - they need to add you to a Slurm account with appropriate QOS access.

Q: Do I need to use the terminal for everything?

A: No! JupyterLab notebooks are great for interactive work. Use the terminal only when needed.

Q: What happens to my files when I stop my server?

A: Your files are safe! Only the running session is terminated. All files in your home directory and project directories remain intact.

Q: Can I share my session with a colleague?

A: No, sessions are personal. However, you can share notebooks and code files through the filesystem or Git.

Q: Is PowerIDE the same as the login nodes?

A: No! PowerIDE runs on compute nodes through the Slurm scheduler, giving you dedicated resources. Login nodes are shared by everyone.

Q: How do I get access to different partitions?

A: Partition access is controlled by Slurm account associations. Contact your PI or HPC admin to request access to specific partitions.

---

Quick Reference Card

Action How-To
Access PowerIDE https://poweride.tau.ac.il/jupyter
Start server Click "Start My Server" → Fill form → Click "Start"
Open notebook Click + → Choose Python kernel
Open terminal Click + → Click "Terminal"
Start VS Code Click + → Click "VS Code" icon
Stop server File → Hub Control Panel → "Stop My Server"
Upload files Drag & drop into file browser
Download files Right-click file → Download
Request custom environment Email hpc@tauex.tau.ac.il with requirements
Get help Email hpc@tauex.tau.ac.il

---

Happy computing! 🚀