This is an old revision of the document!
3D Clustering - SORT3D_DEPTH: Reproducible 3D Clustering on heterogeneous dataset and the 3D parameters of the data remain unchanged during the clustering.
Usage in command line
sxsort3d_depth.py --refinement_dir=DIR --instack=STACK_FILE --output_dir=DIR --niter_for_sorting=NUM_OF_ITERATIONS --nxinit=INITIAL_IMAGE_SIZE --mask3D=MASK3D_FILE --focus=FOCUS3D_FILE --radius=PARTICLE_RADIUS --sym=SYMMETRY --img_per_grp=NUM_OF_IMAGES --img_per_grp_split_rate=SPLIT_RATE --minimum_grp_size=GROUP_SIZE --do_swap_au --swap_ratio=RATIO --memory_per_node=MEMORY_SIZE --depth_order=DEPTH_ORDER --stop_mgskmeans_percentage=PERCENTAGE --nsmear=NUM_OF_SMEARS --orientation_groups=NUM_OF_GROUPS --not_include_unaccounted --notapplybckgnoise --random_group_elimination_threshold
sxsort3d_depth.py exists only in MPI version. It surports single node workstation.
There are two ways of running this command.
1. 3D sorting from meridien iteration: Clustering is initiated from a completed iteration of meridien refinement and imports data from there. This mode uses all meridien information (i.e., smear, normalizations and such).
mpirun -np 48 sxsort3d_depth.py --refinement_dir='outdir_sxmeridien' --output_dir='outdir_sxsort3d_depth_iteration' --radius=52 --sym='c1' --memory_per_node=60.0 --img_per_grp=2000 --minimum_grp_size=1500 --stop_mgskmeans_percentage=10.0 --swap_ratio=5 --do_swap_au --shake=0.1
2. 3D sorting from stack: Clustering is initiated from user-provided orientation parameters stored in stack header. This mode uses only orientation parameters, which is useful for sorting data refined, say with relion.
mpirun -np 48 sxsort3d_depth.py --instack='bdb:data' --output_dir='outdir_sxsort3d_depth_stack' --radius=52 --sym='c1' --img_per_grp=2000 --minimum_grp_size=1500 --stop_mgskmeans_percentage=10.0 --swap_ratio=5 --do_swap_au
Results outputted:
sxsort3d_depth performs 3D clustering on data and keeps 3D orientation parameters of data unchanged. It finds out stable group members by carrying out two-way comparison of two independent Kmeans clustering runs. The Kmeans clustering has minimum group size constraint on each cluster and thus the clustering will not fail in any circumstance.
Option Key | Discription | ||
--depth_order | The parameter resembles the previous option number of independent runs but it controls sorting in an different way. The default value of 2 is a good choice. | ||
--minimum_grp_size | This parameter selects qualified clusters and controls Kmeans clustering stability. The suggested value would be between img_per_grp/2 and img_per_grp but should be less than img_per_grp. | ||
--stop_mgskmeans_percentage | The suggestion would be not to set it too small. 5.0 - 10.0 is a good choice. | ||
--orientation_groups | It divides the asymmetric unit into the specified number of orientation groups and cast the data orientation parameters into them. It is meant to prevent sorting by angle, i.e., assign certain angle to one group, for example top views to one group and side views to another. | ||
--swap_ratio | A ratio of randomly replaced particles in a group, it is meant to prevent premature convergence. When the program obtains both stable groups and unaccounted elements, it reassigns unaccounted elements back to stable groups, and continues sorting. Before re-assignment of unaccounted elements, the program swaps some elements of stable groups with unaccounted ones using this specified swap_ratio. |
1. Simulated ribosome.
14400 particles with 64*64 image size belong to five even groups (all have 2880 members). The command for this run is given in case 2 and it costs 10 minutes on our cluster with 48 cpus.
Group ID | Particles | % of True | |||
group 1 | 2448 | 98% are true members | |||
group 2 | 2493 | 98% are true members | |||
group 3 | 2806 | 98% are true members | |||
group 4 | 2883 | 98% are true members | |||
group 5 | 2891 | 98% are true members |
2. Ribosome EMPIAR-10028:
105,247 particles with image size 360*360 with K=5. It took about 13 hours using 96 CPUs of our cluster, which is about twice the time it took to refine this set. The command for this run is given in case 1. We were able to sort out missing helix and missing domain. (See the attached movie and figure).
K-means, MGSK-means, reproducibility, two-way comparison.
Not published yet.
The following is old descriptions, and will be deleted in near future.
Important Outputs: The results are saved in the directory specified as output_dir ('outdir_sxsort3d_depth' in the example above). The final results are partitioned particles IDs saved in text files. Also, unfiltered maps of each cluster are reconstructed in the way of meridien does. One can use postprocess command to merge the two halves of maps of each group.
Some examples for timing: In general, reconstruction costs more than 80% of time for each sorting.
Zhong Huang
Category 1:: APPLICATIONS
sparx/bin/sxsort3d_depth.py
sxmeridien, sxheader, sx3dvariability, sxsort3d, and sxrsort3d.
Beta:: Under development. It has been tested, The test cases/examples are available upon request. Please let us know if there are any bugs.
There are no known bugs so far.