Alphafold

From HPC Guide
Revision as of 10:40, 19 September 2024 by Dvory (talk | contribs)
Jump to navigation Jump to search

Alphafold

AlphaFold is an artificial intelligence (AI) program developed by DeepMind (part of Alphabet/Google) that predicts protein structures.

Databases

The necessary databases are mounted on nodes with GPUs and are located at `/alphafold_storage/alphafold_db`.

Usage

To run AlphaFold, use the `run_alphafold.sh` script located at `/powerapps/share/centos7/alphafold/alphafold-2.3.1/run_alphafold.sh`.

Script Reference

Required Parameters:
  • `-d <data_dir>`: Path to the directory of supporting data.
  • `-o <output_dir>`: Path to a directory that will store the results.
  • `-f <fasta_paths>`: Path to FASTA files containing sequences. For multiple sequences in a file, it will fold as a multimer. To fold more sequences one after another, separate the files with a comma.
  • `-t <max_template_date>`: Maximum template release date to consider (ISO-8601 format, i.e., YYYY-MM-DD). This parameter helps in folding historical test sets.
Optional Parameters:
  • `-g <use_gpu>`: Enable NVIDIA runtime to run with GPUs (default: true).
  • `-r <run_relax>`: Whether to run the final relaxation step on the predicted models (default: true).
  • `-e <enable_gpu_relax>`: Run relax on GPU if GPU is enabled (default: true).
  • `-n <openmm_threads>`: OpenMM threads (default: all available cores).
  • `-a <gpu_devices>`: Comma-separated list of devices to pass to 'CUDA_VISIBLE_DEVICES' (default: 0).
  • `-m <model_preset>`: Choose preset model configuration: 'monomer', 'monomer_casp14', 'monomer_ptm', or 'multimer' (default: 'monomer').
  • `-c <db_preset>`: Choose preset MSA database configuration ('reduced_dbs' or 'full_dbs', default: 'full_dbs').
  • `-p <use_precomputed_msas>`: Whether to read MSAs written to disk (default: 'false').
  • `-l <num_multimer_predictions_per_model>`: Number of predictions per model when using `model_preset=multimer` (default: 5).
  • `-b <benchmark>`: Run multiple JAX model evaluations to obtain a timing that excludes compilation time (default: 'false').

Example Slurm Script

This script demonstrates how to submit an AlphaFold job using SLURM:

#!/bin/bash
#SBATCH --job-name=AlphaFold-Multimer     # Job name
#SBATCH --partition=gpu2                  # Specify GPU partition
#SBATCH --nodes=1                         # Number of nodes
#SBATCH --ntasks=1                        # Number of tasks (processes)
#SBATCH --cpus-per-task=4                 # Number of CPU cores per task
#SBATCH --gres=gpu:1                      # Request 1 GPU
#SBATCH --output=alphafold_%j.out         # Standard output (with job ID)
#SBATCH --error=alphafold_%j.err          # Standard error (with job ID)

# Description: AlphaFold-Multimer (Non-Docker) with auto-GPU selection

# Load the required module/environment
module load alphafold/alphafold_non_docker_2.3.1

# Run the AlphaFold script
bash $ALPHAFOLD_SCRIPT_PATH/run_alphafold.sh -d $ALPHAFOLD_DB_PATH -o ~/output_dir -f $ALPHAFOLD_SCRIPT_PATH/examples/query.fasta -t $(date +%Y-%m-%d)

Important Notes

  • Output Directory: You can specify the output directory using the `-o` parameter to store the results. This directory can be anywhere you choose.
  • The `-t` (max_template_date) parameter defines the maximum release date of templates to consider in the format `YYYY-MM-DD`. This is crucial when working with historical test sets, as it restricts the search for templates to those released on or before the specified date. You can use different dates depending on your requirements, such as the current date with `$(date +%Y-%m-%d)` or a specific historical date, like `-t 2021-12-31`.

Additional Resources

  • You can download the `dummy_test` folder for sample output from this The Github Repository.
  • For sample data, you can use `/home/alphafold_folder/alphafold_multimer_non_docker/example/query.fasta` or provide your own data for queries.