3D Clustering - SORT3D_DEPTH: Reproducible 3D Clustering on heterogeneous dataset and the 3D parameters of the data remain unchanged during the clustering.
Usage in command line
sp_sort3d_depth.py --refinement_dir=DIR --instack=STACK_FILE --output_dir=DIR --niter_for_sorting=NUM_OF_ITERATIONS --nxinit=INITIAL_IMAGE_SIZE --mask3D=MASK3D_FILE --focus=FOCUS3D_FILE --radius=PARTICLE_RADIUS --nstep=minimum_grp_steps --sym=SYMMETRY --img_per_grp=NUM_OF_IMAGES --memory_per_node=MEMORY_SIZE --depth_order=DEPTH_ORDER --orientation_groups=NUM_OF_ORIEN_GROUPS --not_include_unaccounted --nsmear=NUMBER_OF_SMEAR --notapplybckgnoise --check_smearing --num_core_set=NUM_OF_MIN_IMAGES --overhead=SOME_NUMBER --compute_on_the_fly --use_umat --not_freeze_groups
sp_sort3d_depth.py exists only in MPI version. It surports single node workstation.
There are two ways of running this command and each ways has three modes, namely search mode, freeze group mode, and general mode respectively. The default value of img_per_grp option activates search mode. The advanced parameters are set/reset in function main and set_sorting_global_variables_mpi.
1. 3D sorting from meridien iteration: Clustering is initiated from a completed iteration of meridien refinement and imports data from there. This mode uses all meridien information (i.e., smear, normalizations and such).
mpirun -np 48 sp_sort3d_depth.py --refinement_dir='outdir_sxmeridien' --output_dir='outdir_sp_sort3d_depth_iteration' --radius=30 --sym='c1' --memory_per_node=60.0 --img_per_grp=2000
2. 3D sorting from stack: Clustering is initiated from user-provided orientation parameters stored in stack header. This mode uses only orientation parameters, which is useful for sorting data refined, say with relion.
mpirun -np 48 sp_sort3d_depth.py --instack='bdb:data' --output_dir='outdir_sp_sort3d_depth_stack' --radius=30 --sym='c1'
Results outputted:
sp_sort3d_depth performs 3D clustering on data and keeps 3D orientation parameters of data unchanged. It finds out stable group members by carrying out two-way comparison of two independent Kmeans clustering runs. The Kmeans clustering has minimum group size constraint on each cluster and thus the clustering will not fail in any circumstance.
Option Key | Discription | ||
--img_per_grp | The only sorting parameter user has to be determined by user. The default value sets program in cluster search mode. (default -1) | ||
--nstep | It decreases the minimum group size during sorting and thus select out faked groups when the initial number of groups is set too large. | ||
--use_umat | The option enables the maps reconstructed by fuzzy group membership of particle images. It stabilizes sorting and helps to select out true groups. However, the consumed time is in proportion to the number of particle smearing. | ||
--not_freeze_groups | Freezing group implies not removing groups during within box comparison and it allows sorting uses the strategies before v1.2. | ||
--depth_order | The parameter resembles the previous option number of independent runs but it controls sorting in an different way. The default value of 2 is a good choice. | ||
--orientation_groups | It divides the asymmetric unit into the specified number of orientation groups and cast the data orientation parameters into them. It is meant to prevent sorting by angle, i.e., assign certain angle to one group, for example top views to one group and side views to another. |
1. Simulated ribosome.
17,280 particles with 64*64 image size belong to five even groups (all have 3,456 members). The command for this run is given in case 2 outputs five clusters when K= 5-11 (with focus mask). In case of K = 11 it requires with use_umat on to get five groups only. The program determines clusters by respective structural features hierarchically.
Group ID | Particles | % of True | |||
group 1 | 3303 | 97% are true members | |||
group 2 | 3447 | 98% are true members | |||
group 3 | 3229 | 98% are true members | |||
group 4 | 3454 | 99% are true members | |||
group 5 | 3442 | 99% are true members |
2. Ribosome EMPIAR-10028:
105,247 particles with image size 360*360 with K=5. It took about 7 hours using 96 CPUs of our cluster, which is about twice the time it took to refine this set. The command for this run is given in case 1. We were able to sort out missing helix and missing domain. (See the attached movie and figure).
K-means, MGSK-means, reproducibility, two-way comparison.
Not published yet.
The following is old descriptions, and will be deleted in near future.
Important Outputs: The results are saved in the directory specified as output_dir ('outdir_sp_sort3d_depth' in the example above). The final results are partitioned particles IDs saved in text files.
Some examples for timing: In general, reconstruction costs more than 80% of time for each sorting. Activate do_timing variable in set_sorting_global_variables_mpi and the program prints timing of major steps.
Zhong Huang
Category 1:: APPLICATIONS
sparx/bin/sp_sort3d_depth.py
Beta:: Under development. It has been tested, The test cases/examples are available upon request. Please let us know if there are any bugs.
There are no known bugs so far.