sp_isac2

ISAC2 - 2D Clustering: Iterative Stable Alignment and Clustering (ISAC) of a 2D image stack.


Usage

Usage in command line

sp_isac2.py stack_file output_directory --radius=particle_radius --img_per_grp=img_per_grp --CTF --filament_width=filament_width --xr=xr --thld_err=thld_err --target_radius=target_radius --target_nx=target_nx --VPP --ir=ir --rs=rs --yr=yr --ts=ts --maxit=maxit --center_method=center_method --dst=dst --FL=FL --FH=FH --FF=FF --init_iter=init_iter --iter_reali=iter_reali --stab_ali=stab_ali --minimum_grp_size --rand_seed=rand_seed --skip_prealignment --restart --filament_mask_ignore 


Typical usage

sp_isac2 exists only in MPI version.


1. Conventional TEM dataset:

mpirun -np 96 sp_isac2.py bdb:stack isac2_outdir --radius=120 --CTF

Note: ISAC2 will change the size of input data such that they fit into box size 76×76 by default (see Description below).


2. Phase Plate TEM dataset:

mpirun -np 96 sp_isac2.py bdb:stack isac2_outdir --radius=120 --VPP

Note: CTF and VPP options cannot be used together.


Input

Main Parameters

stack_file
Input image stack: Images must to be square (nx=ny). The stack can be either in .bdb or in .hdf format. (default required string)
output_directory
Output directory: General ISAC output directory to store all results. If the directory already exists ISC will only run in continuation mode (see advanced parameter restart). (default required string)
--radius
Particle radius [Pixels]
Radius of the particle in pixels. ISAC cannot offer a default here since the value will depend on the particle im question. (default required int)
--img_per_grp
Images per class: Ideally the number of images per. In practice this value determines the number of classes K = N / img_per_grp, where N is the total number of images in the input stack. (default 200)
--CTF
CTF phase flipping
Use for cryo datasets. If set to True the data will be phase-flipped using CTF information included in the image headers. Cannot be used together with the VPP option. (default False)
--VPP==False
--VPP
Phase Plate data
Use this option if the dataset is taken with phase plate. Cannot be used togehter with the CTF option. (default False)
--CTF==False
--filament_width
Filament width [Pixels]: When this is set to a non-default value ISAC assumes helical data in which case particle images will be subjected to rectangular masking of the given filament_width value. (default -1)


Advanced Parameters

--xr
Translation search range [Pixels]
The translational search range. Change with care; higher values will incur significantly higher processing costs. (default 1)
--thld_err
Pixel error threshold [Pixels]: Used as a threshold value when checking cluster stability. The pixel error is defined as the root mean square of distances between corresponding pixels from set of found transformations and their average transformation; it depends linearly on square of radius (parameter ou). units - pixels. (default 0.7)
--target_radius
Target particle radius [Pixels]
Particle radius used by ISAC2 to process the data. All particle images will be re-scaled to match their particle radius with this radius. (default 29)
--target_nx
Target particle image size [Pixels]
Image size used by ISAC2 to process the data. particle images will first be resized according to target particle radius (see above) and then cropped or padded to achieve the target image size. When xr > 0, the final image size for ISAC2 processing is target_nx + xr - 1 (default 76)
--ir
Inner ring [Pixels]
Radius of the inner-most ring when resampling images to polar coordinates. (default 1)
--rs
Ring step [Pixels]
Radius step size when resampling images to polar coordinates. (default 1)
--yr
Y search range [Pixels]
The translational search range in the y direction. Set to the value of xr by default. (default -1)
--ts
Search step [Pixels]
Translational search step. (default 1.0)
--maxit
Reference-free alignment iterations: The number of iterations for reference-free alignments. (default 30)
--center_method
Centering method
Method to center global 2D average during the initial prealignment of the data (0: no centering; -1: average shift method; please see center_2D in utilities.py for methods 1-7). (default -1)
--dst
Discrete angle used for within-group alignment
Discrete angle used for within-group alignment. (default 90.0)
--FL
Lowest filter frequency [1/Pixel]
Lowest frequency used for the tangent filter. (default 0.2)
--FH
Highest filter frequency [1/Pixel]
Highest frequency used for the tangent filter. (default 0.45)
--FF
Tangent filter fall-off
The fall-off of the tangent filter. (default 0.2)
--init_iter
Maximum generations
Maximum number of generation iterations performed for a given subset. (default 7)
--iter_reali
SAC stability check interval
Defines every how many iterations the SAC stability checking is performed. (default 1)
--stab_ali
Number of alignments for stability check
The number of alignment runs when checking stability. (default 5)
--minimum_grp_size
Minimum size of reproducible classes
Minimum size of reproducible classes. (default 60)
--rand_seed
Seed: Random seed set before calculations. Useful for testing purposes. By default, ISAC2 sets a random seed number. (default none)
--skip_prealignment
Do pre-alignment: Indicate if pre-alignment should be used or not. Do not use pre-alignment if images are already centered. The 2dalignment directory will still be generated but the parameters will be zero. By default, do pre-alignment. (default False question reversed in GUI)
--restart
Restart run: 0: Restart ISAC2 after the last completed main iteration (i.e. the directory must contain finished file); k: Restart ISAC2 after k-th main iteration, it has to be completed (i.e. the directory must contain finished file), and higer iterations will be removed; Default: Do not restart. (default -1)
--filament_mask_ignore
Ignore filament masking (filament use only): ONLY RELEVANT IF parameter filament_width is set to a non-default value. When processing helical particle images rectangular masking is used (A) to normalize and (B) to mask the actual particle images. The latter can be disabled by setting this flag to True. (default True question reversed in GUI)


Output

For each generation of running the program, there are two phases. The first phase is an exploratory phase. In this phase, we set the criteria to be very loose and try to find as much candidate class averages as possible. The candidate class averages are stored in class_averages_candidate_generation_n.hdf.

The second phase is where the actual class averages are generated. The candidate groups are tested for their stability by repeated randomly initialized reference-free alignment and those deemed stable as set aside as output and member images of those that did not pass the test are returned to the overall pool of data and processed again starting from the first phase.


List of Output Files

File Name Discription
class_averages_generation_n.hdf class averages generated in this generation, there are two attributes associated with each class average which are important. One is members, which stores the particle IDs that are assigned to this class average; the other is n_objects, which stores the number of particles that are assigned to this class average.
class_averages.hdf class averages file that contains all class averages from all generations.
generation_n_accounted.txt IDs of accounted particles in this generation.
generation_n_unaccounted.txt IDs of unaccounted particles in this generation.


Description


Method

The program will perform the following steps (to save computation time, in case of inadvertent termination, i.e. power failure or other causes, the program can be restarted from any saved step location, see options) :

  1. The images in the input stacked will be phase-flipped.
  2. The data stack will be pre-aligned (output is in subdirectory 2dalignment, in particular it contains the overall 2D average aqfinal.hdf, it is advisable to confirm it is correctly centered).
    • In case 2dalignment directory exists steps 1 and 2 are skipped.
  3. The alignment shift parameters will be applied to the input data.
  4. IMPORTANT: Input aligned images will be resized such that the original user-provided radius will be now target_radius and the box size target_nx + xr - 1. The pixel size of the modified data is thus original_pixel_size * original_radius_size / target_radius.
    • The pseudo-code for adjusting the size of the radius and the size of the images is as follows:
    • shrink_ratio = target_radius / original_radius_size
    • new_pixel_size = original_pixel_size * shrink_ratio
    • if shrink_ratio is different than 1: resample images using shrink_ratio
    • if new_pixel_size > target_nx : cut image to be target_nx in size
    • if new_pixel_size < target_nx : pad image to be target_nx in size
    • The target_radius and target_nx options allow the user to finely adjust the image so that it contains enough background information.
  5. The program will iterate through generations of ISAC2 by alternating two steps. The outcome of these two steps is in subdirectory generation_*** (stars replaced by the current generation number).
    • Calculation of candidate class averages.
    • Calculation of validated class averages.
  6. The program will terminate when it cannot find any more reproducible class averages.
  7. If no restart option is given the program will pick-up from the last saved point.

Also see the reference below.


Time and Memory

ISAC2 is very time- and memory-consuming. For example, it may take 15 hours to process 50,000 particles using 256 processors. Therefore, before embarking on the big dataset, it is recommended to run a test dataset (about 20,000 particles) first to get a rough idea of timing.


Retrieval of images signed to selected group averages

  1. Open in e2display.py file class_averages.hdf located in the main directory.
  2. Delete averages whose member particles should not be included in the output.
  3. Save the selected subset under a new name,say select1.hdf
  4. Retrieve IDs of member particles and store them in a text file ok.txt:
    • $ sp_process.py –isacselect class_averages.hdf ok.txt
  5. Create a vritual stack containng selected particles:
    • $ e2bdb.py bdb:data –makevstack:bdb:select1 –list=ok.txt

The same steps can be performed on files containing candidate class averages.


RCT information retrieval

Let us assume we would want to generate a RCT reconstruction using as a basis group number 12 from ISAC2 generation number 3. We have to do the following steps:

  1. Retrieve original image numbers in the selected ISAC2 group. The output is list3_12.txt, which will contain image numbers in the main stack (bdb:test) and thus of the tilted counterparts in the tilted stack. First, change directory to the subdirectory of the main run that contains results of the generation 3. Note bdb:../data is the file in the main output directory containing original (reduced size) particles.
    • $ cd generation_0003
    • $ sp_process.py bdb:../data class_averages_generation_3.hdf list3_12.txt –isacgroup=12 –params=originalid
  2. Extract the identified images from the main stack (into subdirectory RCT, has to be created):
    • $ e2bdb.py bdb:test –makevstack=bdb:RCT/group3_12 –list=list3_12.txt
  3. Extract the class average from the stack (NOTE the awkward numbering of the output file!).
    • $ e2proc2d.py –split=12 –first=12 –last=12 class_averages_generation3.hdf group3_12.hdf
  4. Align particles using the corresponding class average from ISAC2 as a template (please adjust the parameters):
    • $ sp_ali2d.py bdb:RCT/group3_12 None –ou=28 –xr=3 –ts=1 –maxit=1 –template=group3_12.12.hdf
  5. Extract the needed alignment parameters. The order is phi,sp_,sy,mirror. sp_ and mirror are used to transfer to tilted images.
    • $ sp_header.py group3_12.12.hdf –params=xform.align2d –export=params_group3_12.txt


Developer Notes

2017/05/27 Toshio Moriya

  • The meaning of the following option might changed from ISAC to ISAC2.
    • init_iter:: SAC initialization iterations: Number of ab-initio-within-cluster alignment runs used for stability evaluation during SAC initialization. (default 3)
  • The removal of the following option have to be reflected to the Tutorial
    • stop_after_candidates:: Stop after candidates step: The run stops after the 'candidate_class_averages' section is created. (default False)


Reference

Yang, Z., Fang, J., Chittuluru, F., Asturias, F. and Penczek, P. A.: Iterative Stable Alignment and Clustering of 2D Transmission Electron Microscope Images, Structure 20, 237-247, February 8, 2012.


Author / Maintainer

Horatiu Voicu, Zhengfan Yang, Jia Fang, Francisco Asturias, and Pawel A. Penczek


Keywords

Category 1:: APPLICATIONS


Files

sparx/bin/sp_isac2.py, sparx/bin/sp_isac.py, sparx/bin/isac.py


See also

Maturity

Beta:: Under evaluation and testing. Please let us know if there are any bugs.


Bugs

None right now.