Difference between revisions of "Alphafold"

From HPC Guide
Jump to navigation Jump to search
Line 1: Line 1:
=== Alphafold ===
+
==== '''Alphafold''' ====
 
AlphaFold is an artificial intelligence (AI) program developed by Alphabets's/Google's DeepMind which performs predictions of protein structure.
 
AlphaFold is an artificial intelligence (AI) program developed by Alphabets's/Google's DeepMind which performs predictions of protein structure.
  
 +
==== '''Databases:''' ====
 +
Mounted on nodes with gpu, located at /alphafold_storage/alphafold_db.
  
=== How to use===
+
===== Usage: =====
 +
use run_alphafold.sh script located at /powerapps/share/centos7/alphafold/alphafold-2.3.1/run_alphafold.sh
  
use <b>run_alphafold.sh</b> script located at /home/alphafold_folder/alphafold_multimer_non_docker (in compute-0-300)
+
Script reference:
 
+
<code>Required Parameters:
script reference:
+
-d <data_dir>         Path to directory of supporting data
<pre>
+
-o <output_dir>       Path to a directory that will store the results.
Usage: run_alphafold.sh <OPTIONS>
+
-f <fasta_paths>     Path to FASTA files containing sequences. If a FASTA file contains multiple sequences, then it will be folded as a multimer. To fold more sequences one after another, write the files separated by a comma
Required Parameters:
+
-t <max_template_date> Maximum template release date to consider (ISO-8601 format - i.e. YYYY-MM-DD). Important if folding historical test sets
-d <data_dir>     Path to directory with supporting data: AlphaFold parameters and genetic and template databases. Set to the target of download_all_databases.sh.
+
Optional Parameters:
-o <output_dir>   Path to a directory that will store the results.
+
-g <use_gpu>         Enable NVIDIA runtime to run with GPUs (default: true)
-f <fasta_path>   Path to a FASTA file containing a single sequence.
+
-r <run_relax>       Whether to run the final relaxation step on the predicted models. Turning relax off might result in predictions with distracting stereochemical violations but might help in case you are having issues with the relaxation stage (default: true)
-t <max_template_date> Maximum template release date to consider (ISO-8601 format: YYYY-MM-DD). Important if folding historical test sets.
+
-e <enable_gpu_relax> Run relax on GPU if GPU is enabled (default: true)
Optional Parameters:
+
-n <openmm_threads>  OpenMM threads (default: all available cores)
-n <openmm_threads>   OpenMM threads (default: all available cores)
+
-a <gpu_devices>     Comma separated list of devices to pass to 'CUDA_VISIBLE_DEVICES' (default: 0)
-b <benchmark>   Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time required for inferencing many proteins (default: false)
+
-m <model_preset>     Choose preset model configuration - the monomer model, the monomer model with extra ensembling, monomer model with pTM head, or multimer model (default: 'monomer')
-g <use_gpu>     Enable NVIDIA runtime to run with GPUs (default: true)
+
-c <db_preset>       Choose preset MSA database configuration - smaller genetic database config (reduced_dbs) or full genetic database config (full_dbs) (default: 'full_dbs')
-a <gpu_devices> Comma separated list of devices to pass to 'CUDA_VISIBLE_DEVICES' (default: 0)
+
-p <use_precomputed_msas> Whether to read MSAs that have been written to disk. WARNING: This will not check if the sequence, database or configuration have changed (default: 'false')
-m <model_preset> Choose preset model configuration - the monomer model (monomer), the monomer model with extra ensembling (monomer_casp14), monomer model with pTM head (monomer_ptm), or multimer model (multimer) (default: monomer)
+
-l <num_multimer_predictions_per_model> How many predictions (each with a different random seed) will be generated per model. E.g. if this is 2 and there are 5 models then there will be 10 predictions per input. Note: this FLAG only applies if model_preset=multimer (default: 5)
-p <db_preset>       Choose preset MSA database configuration - smaller genetic database config (reduced_dbs) or full genetic database config (full_dbs) (default: full_dbs)
+
-b <benchmark>       Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time required for inferencing many proteins (default: 'false')</code>
-u <use_precomputed_msas>       Whether to read MSAs that have been written to disk. WARNING: This will not check if the sequence, database or configuration have changed. (default: false)
 
-r <remove_msas_after_use>       Whether, after structure prediction(s), to delete MSAs that have been written to disk to significantly free up storage space. (default: false)
 
-i <is_prokaryote>   Optional for multimer system, not used by the single chain system. This should contain a boolean specifying true where the target complex is from a prokaryote, and false where it is not, or where the origin is unknown. These values determine the pairing method for the MSA (default: false)
 
</pre>
 
 
 
==== Databases ====
 
We downloaded the databases to /home/alphafold_folder/alphafold_data on compute-0-300
 
you may use it, or copy it to your own storage and point to it with -d flag of the run script.
 
also, you may download the databases to your own storage via the script <b>download_all_data.sh</b> located at /home/alphafold_folder/alphafold_multimer_non_docker/scripts/
 
  
 +
===== Sample Qsub Script: =====
 +
create folder for output in your home dir mkdir ~/alphafold_output then run the script
  
 +
* you may download dummy_test folder from this github as well for the output
  
==== Sample qsub script ====
+
<nowiki>https://github.com/kalininalab/alphafold_non_docker</nowiki>
'''Note:'''
 
  
create folder for output in your home dir
 
mkdir ~/alphafold_output
 
then run the script
 
* you may download dummy_test folder from this github as well for the output
 
https://github.com/kalininalab/alphafold_non_docker
 
 
* /home/alphafold_folder/alphafold_multimer_non_docker/example/query.fasta = this is sample data, please point to the data you need to query.
 
* /home/alphafold_folder/alphafold_multimer_non_docker/example/query.fasta = this is sample data, please point to the data you need to query.
* The lines '''export CUDA_VISIBLE_DEVICES=$(python3 /powerapps/scripts/check_avail_gpu.py)''' and the flag <b>-a $CUDA_VISIBLE_DEVICES</b> make it so you can use the next free GPU on the server, please leave it as is.
+
* The lines '''export CUDA_VISIBLE_DEVICES=$(python3 /powerapps/scripts/check_avail_gpu.py)''' and the flag '''a $CUDA_VISIBLE_DEVICES''' make it so you can use the next free GPU on the server, please leave it as is.
 
+
* $ALPHAFOLD_SCRIPT_PATH = /powerapps/share/centos7/alphafold/alphafold-2.3.1/
<pre>
+
* $ALPHAFOLD_DB_PATH = /alphafold_storage/alphafold_db
 +
<syntaxhighlight lang="bash">
 
#!/bin/bash
 
#!/bin/bash
 
#PBS -l select=1:ncpus=4:ngpus=1
 
#PBS -l select=1:ncpus=4:ngpus=1
#PBS -q gpu
+
##choose any gpu queue: gpu/gpu2
 +
#PBS -q gpu2
  
 
# Description: AlphaFold-Multimer (Non-Docker) with auto-gpu selection
 
# Description: AlphaFold-Multimer (Non-Docker) with auto-gpu selection
# Original Author: Lev Arie Krapivner
 
  
# load miniconda
+
# load conda env
module load miniconda/miniconda3-4.7.12-environmentally
+
module load alphafold/alphafold_non_docker_2.3.1
# activate relevant venv
 
conda activate /powerapps/share/centos7/miniconda/miniconda3-4.7.12-environmentally/envs/alphafold_non_docker
 
# run alphafold
 
cd /home/alphafold_folder/alphafold_multimer_non_docker/
 
# call to check_available_gpu python script
 
# returns the param for CUDA_VISIBLE_DEVICE which the run alphafold script uses
 
 
 
export CUDA_VISIBLE_DEVICES=$(python3 /powerapps/scripts/check_avail_gpu.py)
 
# echo "CUDA_VISIBLE_DEVICES: $CUDA_VISIBLE_DEVICES"
 
bash run_alphafold.sh -d /home/alphafold_folder/alphafold_data -o ~/output_dir -f /home/alphafold_folder/alphafold_multimer_non_docker/example/query.fasta -t 2020-05-14 -a $CUDA_VISIBLE_DEVICES
 
  
</pre>
 
 
==== Sample qsub script 2 ====
 
'''Note:'''
 
 
create folder for output in your home dir
 
mkdir ~/alphafold_output
 
then run the script
 
 
<pre>
 
#!/bin/bash
 
#PBS -l select=1:ncpus=4:ngpus=1
 
#PBS -q gpu
 
 
# Description: AlphaFold-Multimer (Non-Docker) with auto-gpu selection
 
# Original Author: Lev Arie Krapivner
 
 
# load conda env
 
module load alphafold/alphafold_non_docker_2.2.0
 
cd /home/alphafold_folder/alphafold_multimer_non_docker/
 
 
# call to check_available_gpu python script
 
# call to check_available_gpu python script
 
# returns the param for CUDA_VISIBLE_DEVICE which the run alphafold script uses
 
# returns the param for CUDA_VISIBLE_DEVICE which the run alphafold script uses
Line 91: Line 53:
 
export CUDA_VISIBLE_DEVICES=$(python3 /powerapps/scripts/check_avail_gpu.py)
 
export CUDA_VISIBLE_DEVICES=$(python3 /powerapps/scripts/check_avail_gpu.py)
 
# echo "CUDA_VISIBLE_DEVICES: $CUDA_VISIBLE_DEVICES"
 
# echo "CUDA_VISIBLE_DEVICES: $CUDA_VISIBLE_DEVICES"
run_alphafold.sh -d /home/alphafold_folder/alphafold_data -o ~/output_dir -f /home/alphafold_folder/alphafold_multimer_non_docker/example/query.fasta -t 2020-05-14 -a $CUDA_VISIBLE_DEVICES
+
bash $ALPHAFOLD_SCRIPT_PATH/run_alphafold.sh -d $ALPHAFOLD_DB_PATH -o ~/output_dir -f $ALPHAFOLD_SCRIPT_PATH/examples/query.fasta -t 2020-05-14 -a $CUDA_VISIBLE_DEVICES
 
+
</syntaxhighlight>
</pre>
 
 
 
 
 
[https://github.com/amorehead/alphafold_non_docker Alphafold - non_docker source]
 

Revision as of 14:20, 3 September 2023

Alphafold

AlphaFold is an artificial intelligence (AI) program developed by Alphabets's/Google's DeepMind which performs predictions of protein structure.

Databases:

Mounted on nodes with gpu, located at /alphafold_storage/alphafold_db.

Usage:

use run_alphafold.sh script located at /powerapps/share/centos7/alphafold/alphafold-2.3.1/run_alphafold.sh

Script reference:

Required Parameters:
-d <data_dir>         Path to directory of supporting data
-o <output_dir>       Path to a directory that will store the results.
-f <fasta_paths>      Path to FASTA files containing sequences. If a FASTA file contains multiple sequences, then it will be folded as a multimer. To fold more sequences one after another, write the files separated by a comma
-t <max_template_date> Maximum template release date to consider (ISO-8601 format - i.e. YYYY-MM-DD). Important if folding historical test sets
Optional Parameters:
-g <use_gpu>          Enable NVIDIA runtime to run with GPUs (default: true)
-r <run_relax>        Whether to run the final relaxation step on the predicted models. Turning relax off might result in predictions with distracting stereochemical violations but might help in case you are having issues with the relaxation stage (default: true)
-e <enable_gpu_relax> Run relax on GPU if GPU is enabled (default: true)
-n <openmm_threads>   OpenMM threads (default: all available cores)
-a <gpu_devices>      Comma separated list of devices to pass to 'CUDA_VISIBLE_DEVICES' (default: 0)
-m <model_preset>     Choose preset model configuration - the monomer model, the monomer model with extra ensembling, monomer model with pTM head, or multimer model (default: 'monomer')
-c <db_preset>        Choose preset MSA database configuration - smaller genetic database config (reduced_dbs) or full genetic database config (full_dbs) (default: 'full_dbs')
-p <use_precomputed_msas> Whether to read MSAs that have been written to disk. WARNING: This will not check if the sequence, database or configuration have changed (default: 'false')
-l <num_multimer_predictions_per_model> How many predictions (each with a different random seed) will be generated per model. E.g. if this is 2 and there are 5 models then there will be 10 predictions per input. Note: this FLAG only applies if model_preset=multimer (default: 5)
-b <benchmark>        Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time required for inferencing many proteins (default: 'false')
Sample Qsub Script:

create folder for output in your home dir mkdir ~/alphafold_output then run the script

  • you may download dummy_test folder from this github as well for the output

https://github.com/kalininalab/alphafold_non_docker

  • /home/alphafold_folder/alphafold_multimer_non_docker/example/query.fasta = this is sample data, please point to the data you need to query.
  • The lines export CUDA_VISIBLE_DEVICES=$(python3 /powerapps/scripts/check_avail_gpu.py) and the flag a $CUDA_VISIBLE_DEVICES make it so you can use the next free GPU on the server, please leave it as is.
  • $ALPHAFOLD_SCRIPT_PATH = /powerapps/share/centos7/alphafold/alphafold-2.3.1/
  • $ALPHAFOLD_DB_PATH = /alphafold_storage/alphafold_db
#!/bin/bash
#PBS -l select=1:ncpus=4:ngpus=1
##choose any gpu queue: gpu/gpu2
#PBS -q gpu2

# Description: AlphaFold-Multimer (Non-Docker) with auto-gpu selection

# load conda env
module load alphafold/alphafold_non_docker_2.3.1

# call to check_available_gpu python script
# returns the param for CUDA_VISIBLE_DEVICE which the run alphafold script uses

export CUDA_VISIBLE_DEVICES=$(python3 /powerapps/scripts/check_avail_gpu.py)
# echo "CUDA_VISIBLE_DEVICES: $CUDA_VISIBLE_DEVICES"
bash $ALPHAFOLD_SCRIPT_PATH/run_alphafold.sh -d $ALPHAFOLD_DB_PATH -o ~/output_dir -f $ALPHAFOLD_SCRIPT_PATH/examples/query.fasta -t 2020-05-14 -a $CUDA_VISIBLE_DEVICES