User Tools

Site Tools


gpu_isac
The most recent version of this page is a draft.DiffThis version is outdated by a newer approved version.DiffThis version (2020/08/04 17:50) was approved by fschoenfeld.The Previously approved version (2020/08/04 17:41) is available.Diff

This is an old revision of the document!


Overview

ISAC (Iterative Stable Alignment and Clustering) is a 2D classification algorithm. It sorts a given stack of cryo-EM particles into different classes that share the same view of a target protein. ISAC is based around iterations of alternating equal size k-means clustering and repeated 2D alignment routines.

You can find the details of the ISAC algorithm in this paper. To cite ISAC, use the following: Yang, Z., Fang, J., Chittuluru, J., Asturias, F. J. and Penczek, P. A. (2012) Iterative stable alignment and clustering of 2D transmission electron microscope images. Structure 20, 237–247.

ISAC versions

  • ISAC is the initial version as described in the original paper. At this point this implementation is obsolete and has been replaced by ISAC2 and GPU ISAC (see below).
  • ISAC2 is an improved version of ISAC and used by default tool to produce 2D class averages in the SPHIRE (git) software package and the TranSPHIRE automated pipeline for processing cryo-EM data. ISAC2 is a CPU-only implementation and was developed to run on a computer cluster.
  • GPU ISAC was developed to run ISAC2 on a single workstation by outsourcing its computationally expensive bottleneck calculations to any available GPUs, while simultaneously keeping its MPI-based CPU parallelization otherwise intact. GPU ISAC is currently only provided as an add-on to SPHIRE that can be installed manually (see below).
Chimera Beta: The previously distributed GPU ISAC “Chimera” Beta version is no longer supported and we recommend using the current version of GPU ISAC. You can still find the “Chimera” documentation here, however.

Download & Installation

Before you start, please note the following system requirements
  • CUDA: These installation instructions assume that CUDA is already installed on your system. You can confirm this by running nvcc --version in your terminal; the resulting output should list the version of your installed CUDA compilation tools.
  • SPHIRE: In order to use GPU ISAC, SPHIRE needs to be installed. You can find the SPHIRE download and installation instructions here. You can confirm a working SPHIRE version by running which sphire in your terminal; the resulting output should give you the path to your SPHIRE installation (the path should indicate a version number of 1.3 or higher).

Download

  • GPU ISAC is currently developed as a manually installed add-on for SPHIRE and distributed as a .zip file that can be found here: GPU ISAC download link.

Installation

Before you start, make sure your SPHIRE environment is activated.

How to activate your SPHIRE environment:

How to activate your SPHIRE environment:

  • During the SPHIRE installation, an Anaconda environment for SPHIRE was created. You can list your available Anaconda environments using:
    conda env list
  • Look for your SPHIRE environment and activate it using either:
    conda activate NAME_OF_YOUR_ENVIRONMENT

    or

    source activate NAME_OF_YOUR_ENVIRONMENT

    It will depend on your system and Anaconda installation which one of these you will have to use.


GPU ISAC comes with a handy installation script that can be used as follows:

  1. Extract the archive to your chosen GPU ISAC installation folder.
  2. Open a terminal and navigate to your installation folder.
  3. Run the installation script:
    ./install.sh

All done!

Running GPU ISAC

An example call to use GPU ISAC looks as follows:

mpirun /path/to/sxisac2_gpu.py bdb:path/to/stack path/to/output --CTF -–radius=160 --target_radius=29 --target_nx=76 --img_per_grp=100 --minimum_grp_size=60 --thld_err=0.7 --center_method=0 --gpu_devices=0,1

Using the following mix of both mandatory and optional parameters (see below to learn which is which):

mpirun /path/to/sxisac2_gpu.py
bdb:path/to/stack
path/to/output
--CTF 
-–radius=160
--target_radius=29
--target_nx=76
--img_per_grp=100
--minimum_grp_size=60
--thld_err=0.7
--center_method=0
--gpu_devices=0,1

[ ! ] - Mandatory parameters in the GPU ISAC call:

  • mpirun is not a GPU ISAC parameter, but is required to launch GPU ISAC using MPI parallelization (GPU ISAC uses both CPU/MPI and GPU/CUDA parallelization).
  • /path/to/sxisac2_gpu.py is the path to your sxisac2_gpu.py file. If you followed these instructions it should be your/installation/path/gpu_isac_2.2/bin/sxisac2_gpu.py.
  • path/to/stack is the path to your input .bdb stack. If you prefer to use an .hdf stack, simply remove the bdb: prefix.
  • path/to/output is the path to your preferred output directory.
  • --radius=160 is the radius of your target particle (in pixels) and has to be set accordingly.

[?] - Optional parameters in the GPU ISAC call:

  • Using --gpu_devices you can set what GPUs to use. This example uses two GPUs with id values 0 and 1, respectively. You can check the id values of your available GPUs by executing nvidia-smi in your terminal (GPUs are sorted by capability, with 0 being your strongest GPU).
  • You can also use --img_per_grp to limit the maximum size of individual classes. Empirically, a class size of 100-200 (30-50 for negative stain) particles has been proven successful when dealing with around 100,000 particles.
  • Similarly, you can also use --minimum_grp_size to limit the minimum size of individual classes. In general, this value should be around 50-60% of your maximum class size.
  • The full list of GPU ISAC / ISAC2 parameters can be found here.
  • Additional utilities that are helpful when using any version of ISAC can be found here.
  • More information about using ISAC for 2D classification can also be found in the ISAC chapter of the official SPHIRE tutorial (link to .pdf file).

GPU ISAC output files

GPU ISAC produces a multitude of output files that can be used to analyze the success of running the program, even while it is still ongoing. These include the following:

  • Main iteration folders: As GPU ISAC is running, it performs multiple “main iterations” and “generations” that are stored within the output folder structure. New class averages are produced during every iteration / generation and can be looked at during runtime without having to wait for the overall process to conclude. This can help to quickly gauge the quality of a data set. Check path/to/output/mainXXX/generationYYY for the .hdf files to that contain any newly produced class averages.
  • In both the main iteration folders and the base output folder you will find processed_images.txt files. These contain the indices of all processed particles and can be used to determine how many particles GPU ISAC did account for during classification.
  • The final averages are stored in path/to/output/ordered_class_averages.hdf.
gpu_isac.1596556083.txt.gz · Last modified: 2020/08/04 17:48 by fschoenfeld