sxisac
ISAC - 2D Clustering: Iterative Stable Alignment and Clustering (ISAC) of a 2D image stack.
Usage
Usage in command line
sxisac.py stack_file output_directory --radius=particle_radius --img_per_grp=img_per_grp --CTF --xr=xr --thld_err=thld_err --stop_after_candidates --restart_section --target_radius=target_radius --target_nx=target_nx --ir=ir --rs=rs --yr=yr --ts=ts --maxit=maxit --center_method=center_method --dst=dst --FL=FL --FH=FH --FF=FF --init_iter=init_iter --main_iter=main_iter --iter_reali=iter_reali --match_first=match_first --max_round=max_round --match_second=match_second --stab_ali=stab_ali --indep_run=indep_run --thld_grp=thld_grp --n_generations=n_generations --rand_seed=rand_seed --new --debug --use_latest_master_directory --skip_prealignment
Typical usage
sxisac exists only in MPI version.
mpirun -np 176 --host <host list> sxisac.py bdb:data fisac1 --radius=120 --CTF > 1fou &
mpirun -np 176 --host <host list> sxisac.py bdb:data fisac1 --radius=120 --CTF --restart_section=candidate_class_averages,4 --stop_after_candidates > 1fou &
Note that ISAC will change the size of input data so that they fit into a box of 76×76 pixels by default (see Description below).
The ISAC program needs an MPI environment to work properly. Importantly, the number of MPI processes must be a multiple of the number of independent runs (indep_run, see parameters below).
Depending on the cluster you are running, the way of using MPI will be significantly different. On some clusters,
mpirun -np 32 sxisac.py ...
will be sufficient. On some clusters, one needs to specify the host name:
mpirun -np 32 --host node1,node2,node3,node4 sxisac.py ...
On some clusters, one needs to submit a script to run MPI, please ask your system manager about how to run MPI program on your machine.
Also, different systems have different ways of storing the printout. On some clusters, printout is automatically saved. If it is not, we recommend to use the linux command nohup
to run the program, so that the printout is automatically saved to the text file nohup.out. For example:
mpirun -np 32 sxisac.py bdb:test --img_per_grp=150 --generation=1
If there is no nohup
on your system, you can redirect the printout to a text file.
mpirun -np 32 sxisac.py bdb:test --img_per_grp=150 --generation=1 > output.txt
To restart a run that stopped intentionally or unintentionally, use the '–restart_section' option.
Main Parameters
- stack_file
- Input image stack: The images must to be square (
nx
=ny
). The stack can be either in bdb or hdf format. (default required string)
- output_directory
- Output directory: The directory will be automatically created and the results will be written here. If the directory already exists, results will be written there, possibly overwriting previous runs. (default required string)
- --radius
- Particle radius [Pixels]: Radius of the particle (pixels) (default required int)
- --img_per_grp
- Images per class: Number of images per class in an ideal situation. In practice, it defines the maximum size of the classes. (default 100)
- --CTF
- Phase-flip: If set, the data will be phase-flipped using CTF information included in the image headers. (default False)
- --xr
- X search range [Pixels]: The translational search range in the x direction. Set by the program by default. (default 1)
- --thld_err
- Pixel error threshold [Pixels]: Used for checking stability. It is defined as the root mean square of distances between corresponding pixels from set of found transformations and theirs average transformation, depends linearly on square of radius (parameter ou). units - pixels. (default 0.7)
- --stop_after_candidates
- Stop after candidates step: The run stops after the 'candidate_class_averages' section is created. (default False)
- --restart_section
- Restart section: Each iteration contains three sections: 'restart', 'candidate_class_averages', and 'reproducible_class_averages'. To restart, for example, from generation 4 - section 'candidate_class_averages', then set to: '–restart_section=candidate_class_averages,4'.
The option requires no white space before or after the comma. By default, the execution restarts from where it stopped. A default restart requires also to provide the name of the directory as argument. Alternatively, the '–use_latest_master_directory' option can be used. (default ' ')
- --target_radius
- Target particle radius [Pixels]: Particle radius used by isac to process the data. The images will be resized to fit this radius (default 29)
- --target_nx
- Target particle image size [Pixels]: Image size used by isac to process the data. The images will be resized according to target particle radius and then cut/padded to achieve the target image size. When xr > 0, the final image size for isac processing is 'target_nx + xr - 1' (default 76)
Advanced Parameters
- --ir
- Inner ring [Pixels]: Inner of the resampling to polar coordinates. (default 1)
- --rs
- Ring step [Pixels]: Step of the resampling to polar coordinates. (default 1)
- --yr
- Y search range [Pixels]: The translational search range in the y direction. Set as xr by default. (default -1)
- --ts
- Search step [Pixels]: Translational search step. (default 1.0)
- --maxit
- Reference-free alignment iterations: (default 30)
- --center_method
- Centering method: Method to center global 2D average during the initial prealignment of the data (0 = no centering; -1 = average shift method; please see center_2D in utilities.py for methods 1-7). (default -1)
- --dst
- Discrete angle used for within-group alignment: (default 90.0)
- --FL
- Lowest filter frequency [1/Pixel]: Lowest frequency used for the tangent filter. (default 0.2)
- --FH
- Highest filter frequency [1/Pixel]: Highest frequency used for the tangent filter. (default 0.3)
- --FF
- Tangent filter fall-off: (default 0.2)
- --init_iter
- SAC initialization iterations: Number of ab-initio-within-cluster alignment runs used for stability evaluation during SAC initialization. (default 3)
- --main_iter
- SAC main iterations: Number of ab-initio-within-cluster alignment runs used for stability evaluation during the main SAC. (default 3)
- --iter_reali
- SAC stability check interval: Defines every how many iterations the SAC stability checking is performed. (default 1)
- --match_first
- Initial phase 2-way matching iterations: (default 1)
- --max_round
- Maximum candidate generation rounds: Maximum number of rounds to generate the candidate class averages in the first phase. (default 20)
- --match_second
- Second phae 2- or 3-way matching iterations: (default 5)
- --stab_ali
- Number of alignments for stability check: (default 5)
- --indep_run
- Independent runs for reproducibility tests: By default, perform full ISAC to 4-way matching. Value indep_run=2 will restrict ISAC to 2-way matching and 3 to 3-way matching. Note the number of used MPI processes requested in mpirun must be a multiplicity of indep_run. (default 4)
- --thld_grp
- Minimum size of reproducible class: (default 10)
- --n_generations
- Maximum generations: The program stops when reaching this total number of generations: (default 10)
- --rand_seed
- Seed: Useful for testing purposes. By default, isac sets a random seed number. (default none)
- --new
- Use new code: (default False)
- --debug
- Debug info: (default False)
- --use_latest_master_directory
- Use latest master directory: When active, the program looks for the latest directory that starts with the word 'master'. (default False)
- --skip_prealignment
- Skip pre-alignment: Useful if images are already centered. The 2dalignment directory will still be generated but the parameters will be zero. (default False)
Output
Each generation of the program is divided into two phases. The first one is exploratory. In it, we set the criteria to be very loose and try to find as many candidate class averages as possible. This phase typically should have 10 to 20 rounds (set by –max_round, default = 20). The candidate class averages are stored in class_averages_candidate_generation_n.hdf.
The second phase is where the actual class averages are generated. It typically has 3~9 iterations (set by –match_second, default = 5) of matching. The iterations in the first half are 2-way matching, in the second half of 3-way matching, and the fianl iteration is 4-way matching.
After the second phase, three files will be generated:
class_averages_generation_n.hdf: Class averages generated in this generation. There are two attributes associated with each class average that are important. One is members, which stores the particle IDs that are assigned to this class average. The other is n_objects, which stores the number of particles that are assigned to this class average.
class_averages.hdf: Class averages file that contains all class averages from all generations.
generation_n_accounted.txt: IDs of accounted particles in this generation.
generation_n_unaccounted.txt: IDs of unaccounted particles in this generation.
Description
Method
The program will perform the following steps (to save computation time, in case of inadvertent termination, i.e. power failure or other causes, the program can be restarted from any saved step location, see options) :
The images in the input stacked will be phase-flipped.
The data stack will be pre-aligned. The overall 2D average will be written to aqfinal.hdf, located in the 2dalignment subdirectory. We recomemded to check that this is correctly centered. In case 2dalignment directory exists steps 1 and 2 are skipped.
The alignment shift parameters will be applied to the input data.
IMPORTANT. Input aligned images will be resized such that the original user-provided radius will be now target_radius and the box size target_nx + xr - 1. The pixel size of the modified data is thus original_pixel_size * original_radius_size / target_radius. A pseudo-code for adjusting the radius and the size of the images is:
shrink_ratio = target_radius / original_radius_size
new_pixel_size = original_pixel_size * shrink_ratio
if shrink_ratio is different than 1: resample images using shrink_ratio
if new_pixel_size > target_nx : cut image to be target_nx in size
if new_pixel_size < target_nx : pad image to be target_nx in size
IMPORTANT: The –target_radius and –target_nx options allow the user to finely adjust the image so that it contains enough background information.
The program will iterate through generations of ISAC by alternating two steps. The outcome of these two steps is in subdirectory generation_n, where n represents the current generation number. During each iteration, the program will:
The program will terminate when it cannot find any more reproducible class averages.
If no restart option is given the program will start from the last saved point.
Time and Memory
Unfortunately, ISAC is very time and memory consuming. For example, on our cluster, it takes 15 hours to process 50,000 64×64 particles with 256 cores. Therefore, before embarking on the big dataset, we recommend to run a test dataset (about 2,000~5,000 particles) first to get a rough idea of timing. If the timing is beyond acceptable, the first parameter you could change is –max_round. A value of 10 or even 5 should have mild effects on the results.
In case of premature termination (e.g. power failure), the program can be restarted from any saved step location with the –restart_section option.
Retrieval of images signed to selected group averages
Open in e2display.py file class_averages.hdf located in the main directory.
Delete averages whose member particles should not be included in the output.
Save the selected subset under a new name, for example select1.hdf
Retrieve IDs of member particles and store them in a text file ohk.txt:
sxprocess.py --isacselect class_averages.hdf ok.txt
Create a virtual stack containng selected particles:
e2bdb.py bdb:data --makevstack:bdb:select1 --list=ohk.txt
The same steps can be performed on files containing candidate class averages.
Let us assume we want to generate a RCT reconstruction using as a basis group number 12 from ISAC generation number 3. We have to do the following steps:
Retrieve original image numbers in the selected ISAC group. The output is list3_12.txt, which will contain image numbers in the main stack (bdb:test) and thus of the tilted counterparts in the tilted stack. First, change directory to the subdirectory of the main run that contains results of the generation 3. Note that bdb:../data is the file in the main output directory containing the original (reduced size) particles.
cd generation_0003
sxprocess.py bdb:../data class_averages_generation_3.hdf list3_12.txt --isacgroup=12 --params=originalid
Extract the identified images from the main stack into the subdirectory RCT, which has to be created:
e2bdb.py bdb:test --makevstack=bdb:RCT/group3_12 --list=list3_12.txt
Extract the class average from the stack (
NOTE the awkward numbering of the output file!).
e2proc2d.py --split=12 --first=12 --last=12 class_averages_generation3.hdf group3_12.hdf
Align the particles using the corresponding class average from ISAC as a template:
sxali2d.py bdb:RCT/group3_12 None --ou=28 --xr=3 --ts=1 --maxit=1 --template=group3_12.12.hdf
Extract the needed alignment parameters. The order is phi,sx,sy,mirror. sx and mirror are used to transfer to tilted images.
sxheader.py group3_12.12.hdf --params=xform.align2d --export=params_group3_12.txt
Developer Notes
Reference
Yang, Z., Fang, J., Chittuluru, F., Asturias, F. and Penczek, P. A. (2012) Iterative Stable Alignment and Clustering of 2D Transmission Electron Microscope Images. Structure 20:237-247. doi:10.1016/j.str.2011.12.007
Author / Maintainer
Horatiu Voicu, Zhengfan Yang, Jia Fang, Francisco Asturias, and Pawel A. Penczek
Keywords
Category 1:: APPLICATIONS
Files
sparx/bin/sxisac.py, sparx/bin/isac.py
See also
Maturity
Beta:: Under evaluation and testing. Please let us know if there are any bugs.
Bugs
There are no known bugs so far.