Difference between revisions of "Alphafold"

From HPC Guide
Jump to navigation Jump to search
orig>Levk
orig>Levk
Line 5: Line 5:
 
=== How to use===
 
=== How to use===
  
Sample script for qsub
+
use <b>run_alphafold.sh</b> script located at /home/alphafold_folder/alphafold_multimer_non_docker
works with gpu queue
+
 
 +
script reference:
 +
<pre>
 +
Usage: run_alphafold.sh <OPTIONS>
 +
Required Parameters:
 +
-d <data_dir>    Path to directory with supporting data: AlphaFold parameters and genetic and template databases. Set to the target of download_all_databases.sh.
 +
-o <output_dir>  Path to a directory that will store the results.
 +
-f <fasta_path>  Path to a FASTA file containing a single sequence.
 +
-t <max_template_date> Maximum template release date to consider (ISO-8601 format: YYYY-MM-DD). Important if folding historical test sets.
 +
Optional Parameters:
 +
-n <openmm_threads>  OpenMM threads (default: all available cores)
 +
-b <benchmark>    Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time required for inferencing many proteins (default: false)
 +
-g <use_gpu>      Enable NVIDIA runtime to run with GPUs (default: true)
 +
-a <gpu_devices>  Comma separated list of devices to pass to 'CUDA_VISIBLE_DEVICES' (default: 0)
 +
-m <model_preset>  Choose preset model configuration - the monomer model (monomer), the monomer model with extra ensembling (monomer_casp14), monomer model with pTM head (monomer_ptm), or multimer model (multimer) (default: monomer)
 +
-p <db_preset>      Choose preset MSA database configuration - smaller genetic database config (reduced_dbs) or full genetic database config (full_dbs) (default: full_dbs)
 +
-u <use_precomputed_msas>      Whether to read MSAs that have been written to disk. WARNING: This will not check if the sequence, database or configuration have changed. (default: false)
 +
-r <remove_msas_after_use>      Whether, after structure prediction(s), to delete MSAs that have been written to disk to significantly free up storage space. (default: false)
 +
-i <is_prokaryote>  Optional for multimer system, not used by the single chain system. This should contain a boolean specifying true where the target complex is from a prokaryote, and false where it is not, or where the origin is unknown. These values determine the pairing method for the MSA (default: false)
 +
</pre>
 +
 
 +
==== Databases ====
 +
We downloaded the databases to /home/alphafold_folder/alphafold_data on compute-0-300
 +
you may use it, or copy it to your own storage and point to it with -d flag of the run script.
 +
also, you may download the databases to your own storage via the script <b>download_all_data.sh</b> located at /home/alphafold_folder/alphafold_multimer_non_docker/scripts/
 +
 
 +
 
 +
 
 +
==== Sample qsub script ====
  
 
<pre>
 
<pre>
Line 12: Line 40:
 
#PBS -l select=1:ncpus=4:ngpus=1
 
#PBS -l select=1:ncpus=4:ngpus=1
 
#PBS -q gpu
 
#PBS -q gpu
conda activate module load miniconda/miniconda3-4.7.12-environmentally
+
# load miniconda
/powerapps/share/centos7/miniconda/miniconda3-4.7.12-environmentally/envs/alphafold_non_docker
+
module load miniconda/miniconda3-4.7.12-environmentally
bash /home/alphafold_folder/alphafold_multimer_non_docker/run_alphafold.sh -d /tzachi_storage/evgenyf/alphafold/alphafold_data -
+
# activate relevant venv
o /home/alphafold_folder/alphafold_multimer_non_docker/dummy_test/ -f /home/alphafold_folder/alphafold_multimer_non_docker/examp
+
conda activate /powerapps/share/centos7/miniconda/miniconda3-4.7.12-environmentally/envs/alphafold_non_docker
le/query.fasta  -t 2020-05-14
+
# run alphafold
 +
bash /home/alphafold_folder/alphafold_multimer_non_docker/run_alphafold.sh -d /tzachi_storage/evgenyf/alphafold/alphafold_data -o /home/alphafold_folder/alphafold_multimer_non_docker/dummy_test/ -f /home/alphafold_folder/alphafold_multimer_non_docker/example/query.fasta  -t 2020-05-14
 
</pre>
 
</pre>
  
[https://github.com/amorehead/alphafold_non_docker Alphafold - nondocker source]
+
[https://github.com/amorehead/alphafold_non_docker Alphafold - non_docker source]

Revision as of 15:18, 22 February 2022

Alphafold

AlphaFold is an artificial intelligence (AI) program developed by Alphabets's/Google's DeepMind which performs predictions of protein structure.


How to use

use run_alphafold.sh script located at /home/alphafold_folder/alphafold_multimer_non_docker

script reference:

Usage: run_alphafold.sh <OPTIONS>
Required Parameters:
-d <data_dir>     Path to directory with supporting data: AlphaFold parameters and genetic and template databases. Set to the target of download_all_databases.sh.
-o <output_dir>   Path to a directory that will store the results.
-f <fasta_path>   Path to a FASTA file containing a single sequence.
-t <max_template_date> Maximum template release date to consider (ISO-8601 format: YYYY-MM-DD). Important if folding historical test sets.
Optional Parameters:
-n <openmm_threads>   OpenMM threads (default: all available cores)
-b <benchmark>    Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time required for inferencing many proteins (default: false)
-g <use_gpu>      Enable NVIDIA runtime to run with GPUs (default: true)
-a <gpu_devices>  Comma separated list of devices to pass to 'CUDA_VISIBLE_DEVICES' (default: 0)
-m <model_preset>  Choose preset model configuration - the monomer model (monomer), the monomer model with extra ensembling (monomer_casp14), monomer model with pTM head (monomer_ptm), or multimer model (multimer) (default: monomer)
-p <db_preset>       Choose preset MSA database configuration - smaller genetic database config (reduced_dbs) or full genetic database config (full_dbs) (default: full_dbs)
-u <use_precomputed_msas>       Whether to read MSAs that have been written to disk. WARNING: This will not check if the sequence, database or configuration have changed. (default: false)
-r <remove_msas_after_use>       Whether, after structure prediction(s), to delete MSAs that have been written to disk to significantly free up storage space. (default: false)
-i <is_prokaryote>   Optional for multimer system, not used by the single chain system. This should contain a boolean specifying true where the target complex is from a prokaryote, and false where it is not, or where the origin is unknown. These values determine the pairing method for the MSA (default: false)

Databases

We downloaded the databases to /home/alphafold_folder/alphafold_data on compute-0-300 you may use it, or copy it to your own storage and point to it with -d flag of the run script. also, you may download the databases to your own storage via the script download_all_data.sh located at /home/alphafold_folder/alphafold_multimer_non_docker/scripts/


Sample qsub script

#!/bin/bash
#PBS -l select=1:ncpus=4:ngpus=1
#PBS -q gpu
# load miniconda
module load miniconda/miniconda3-4.7.12-environmentally
# activate relevant venv
conda activate /powerapps/share/centos7/miniconda/miniconda3-4.7.12-environmentally/envs/alphafold_non_docker
# run alphafold
bash /home/alphafold_folder/alphafold_multimer_non_docker/run_alphafold.sh -d /tzachi_storage/evgenyf/alphafold/alphafold_data -o /home/alphafold_folder/alphafold_multimer_non_docker/dummy_test/ -f /home/alphafold_folder/alphafold_multimer_non_docker/example/query.fasta  -t 2020-05-14

Alphafold - non_docker source