This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Last revision Both sides next revision | ||
pipeline:sort3d:sxsort3d_depth [2018/06/20 13:12] 127.0.0.1 external edit |
pipeline:sort3d:sxsort3d_depth [2019/04/02 10:09] lusnig |
||
---|---|---|---|
Line 1: | Line 1: | ||
~~NOTOC~~ | ~~NOTOC~~ | ||
- | ===== sxsort3d_depth | + | ===== sp_sort3d_depth |
- | 3D Clustering - SORT3D_DEPTH: | + | 3D Clustering - SORT3D_DEPTH: |
\\ | \\ | ||
Line 9: | Line 9: | ||
Usage in command line | Usage in command line | ||
- | | + | |
\\ | \\ | ||
===== Typical usage ===== | ===== Typical usage ===== | ||
- | sxsort3d_depth.py exists only in MPI version. It surports single node workstation. | + | sp_sort3d_depth.py exists only in MPI version. It surports single node workstation. |
- | There are two ways of running this command. | + | There are two ways of running this command |
- | \\ __1. 3D sorting from meridien iteration__: | + | \\ __1. 3D sorting from meridien iteration__: |
- | mpirun -np 48 sxsort3d_depth.py --refinement_dir=' | + | mpirun -np 48 sp_sort3d_depth.py --refinement_dir=' |
- | \\ __2. 3D sorting from stack__: | + | \\ __2. 3D sorting from stack__: |
- | mpirun -np 48 sxsort3d_depth.py --instack=' | + | mpirun -np 48 sp_sort3d_depth.py --instack=' |
\\ | \\ | ||
===== Input ===== | ===== Input ===== | ||
=== Main Parameters === | === Main Parameters === | ||
- | ; %%--%%refinement_dir : Meridien | + | ; %%--%%refinement_dir : Meridien |
- | ; %%--%%instack : Input images stack: | + | ; %%--%%instack : Input images stack: |
- | ; %%--%%output_dir : Output directory: | + | ; %%--%%output_dir : Output directory: |
- | + | ; %%--%%niter_for_sorting : Iteration | |
- | ; %%--%%niter_for_sorting : 3D refinement iteration ID: Specify the iteration | + | ; %%--%%nxinit : Initial image size: Image size used for MGSKmeans in case of starting sorting from a data stack. By default, the program determines window size. Option is valid only for stack mode. (default -1) |
- | ; %%--%%nxinit : Initial image size: Image size used for MGSKmeans in case of starting sorting from a data stack. By default, the program determines window size. Specific to stack mode. (default -1) | + | ; %%--%%mask3D : 3D mask: A string denotes file path of the global 3D mask for clustering. Imported from 3D refinement unless user wishes a different one in meridien iteration mode. (default none) |
- | ; %%--%%mask3D : 3D mask: File path of the global 3D mask for clustering. (default none) | + | ; %%--%%focus : Focus 3D mask: A string denotes file path of a binary 3D mask for focused clustering. (default none) |
- | ; %%--%%focus : Focus 3D mask: File path of a binary 3D mask for focused clustering. (default none) | + | ; %%--%%radius : Estimated particle radius [Pixels]: |
- | ; %%--%%radius : Estimated particle radius [Pixels]: | + | ; %%--%%sym : Point-group symmetry: |
- | ; %%--%%sym : Point-group symmetry: | + | ; %%--%%img_per_grp : Number of images per group: User expected group size in integer. By default |
- | ; %%--%%img_per_grp : Number of images per group: User expected group size. This value is critical for a successful 3D clustering. (default 1000) | + | |
- | ; %%--%%img_per_grp_split_rate : Group splitting rate: Rate for splitting the number of images per group (%%--%%img_per_grp). (default 1) | + | |
- | ; %%--%%minimum_grp_size : Minimum size of reproducible class: It serves as the minimum size of selected or accounted clusters as well as the minimum group size constraint | + | |
- | ; %%--%%do_swap_au : Swap flag: Randomly swap a certain number of accounted elements per cluster with the unaccounted elements. If the processing with the default values are extremely slow or stalled, please use this --do_swap_au option and set --swap_ratio to a large value (15.0[%] is a good start point). (default | + | |
- | ; %%--%%swap_ratio : Swap percentage [%]: Specify a swap percentage between 0.0[%] and 50.0[%]. Effective only with --do_swap_au. Without --do_swap_au, | + | |
; %%--%%memory_per_node : Memory per node [GB]: User provided information about memory per node in GB (NOT per CPU). It will be used to evaluate the number of CPUs per node from user-provided MPI setting. By default, it uses 2GB * (number of CPUs per node). (default -1.0) | ; %%--%%memory_per_node : Memory per node [GB]: User provided information about memory per node in GB (NOT per CPU). It will be used to evaluate the number of CPUs per node from user-provided MPI setting. By default, it uses 2GB * (number of CPUs per node). (default -1.0) | ||
+ | ; %%--%%overhead: | ||
+ | | ||
+ | | ||
+ | | ||
\\ | \\ | ||
=== Advanced Parameters === | === Advanced Parameters === | ||
- | ; %%--%%depth_order : Depth order: | + | ; %%--%%depth_order : Depth order: |
- | ; %%--%%stop_mgskmeans_percentage : Stop MGSKmeans percentage [%]: Particle change percentage for stopping minimum group size K-means. (default 10.0) | + | |
; %%--%%nsmear : Number of smears for sorting: Fill it with 1 if user does not want to use all smears. (default -1) | ; %%--%%nsmear : Number of smears for sorting: Fill it with 1 if user does not want to use all smears. (default -1) | ||
- | ; %%--%%orientation_groups : Number of orientation groups: Number of orientation groups in the asymmetric unit. (default | + | ; %%--%%orientation_groups : Number of orientation groups: Number of orientation groups in an asymmetric unit. (default |
; %%--%%not_include_unaccounted : Do unaccounted reconstruction: | ; %%--%%not_include_unaccounted : Do unaccounted reconstruction: | ||
; %%--%%notapplybckgnoise : Use background noise flag: Flag to turn off background noise. (default False question reversed in GUI) | ; %%--%%notapplybckgnoise : Use background noise flag: Flag to turn off background noise. (default False question reversed in GUI) | ||
- | ; %%--%%random_group_elimination_threshold | + | ; %%--%%check_smearing |
+ | ; %%--%%num_core_set : Number of core set images: Map will not be reconstructed if number of core set images is less than this number. By default, | ||
+ | ; %%--%%compute_on_the_fly : Compute on the fly: A boolean flag controls the number | ||
+ | ; %%--%%nstep : Number of steps to decrease minimum group size: An integer number controls group size constrained Kmeans clustering. By default, it is set 5. (default 5) | ||
+ | ; %%--%%not_freeze_groups : Remove small groups during within box comparison: A boolean flag to control | ||
+ | ; %%--%%use_umat : Use fuzzy membership of particle images in making maps: A boolean flag to apply fuzzy group membership to control sorting stability. By default, it is false. (default False) | ||
+ | |||
\\ | \\ | ||
Line 63: | Line 68: | ||
\\ | \\ | ||
===== Description ===== | ===== Description ===== | ||
- | sxsort3d_depth | + | sp_sort3d_depth performs 3D clustering on data and keeps 3D orientation parameters of data unchanged. It finds out stable |
\\ | \\ | ||
=== Important Options === | === Important Options === | ||
|| **Option Key** || **Discription** || | || **Option Key** || **Discription** || | ||
+ | || %%--%%img_per_grp || The only sorting parameter user has to be determined by user. The default value sets program in cluster search mode. (default -1) || | ||
+ | || %%--%%nstep || It decreases the minimum group size during sorting and thus select out faked groups when the initial number of groups is set too large. | ||
+ | || %%--%%use_umat || The option enables the maps reconstructed by fuzzy group membership of particle images. It stabilizes sorting and helps to select out true groups. However, the consumed time is in proportion to the number of particle smearing. || | ||
+ | || %%--%%not_freeze_groups || Freezing group implies not removing groups during within box comparison and it allows sorting uses the strategies before v1.2. || | ||
|| %%--%%depth_order || The parameter resembles the previous option number of independent runs but it controls sorting in an different way. The default value of 2 is a good choice. || | || %%--%%depth_order || The parameter resembles the previous option number of independent runs but it controls sorting in an different way. The default value of 2 is a good choice. || | ||
- | || %%--%%minimum_grp_size || This parameter selects qualified clusters and controls Kmeans clustering stability. The suggested value would be between img_per_grp/ | ||
- | || %%--%%stop_mgskmeans_percentage || Even though this option is not new, here the suggestion would be not to set it too small. 5.0 - 10.0 is a good choice. || | ||
|| %%--%%orientation_groups || It divides the asymmetric unit into the specified number of orientation groups and cast the data orientation parameters into them. It is meant to prevent sorting by angle, i.e., assign certain angle to one group, for example top views to one group and side views to another. || | || %%--%%orientation_groups || It divides the asymmetric unit into the specified number of orientation groups and cast the data orientation parameters into them. It is meant to prevent sorting by angle, i.e., assign certain angle to one group, for example top views to one group and side views to another. || | ||
- | || %%--%%swap_ratio || A ratio of randomly replaced particles in a group, it is meant to prevent premature convergence. When the program obtains both stable groups and unaccounted elements, it reassigns unaccounted elements back to stable groups, and continues sorting. Before re-assignment of unaccounted elements, the program swaps some elements of stable groups with unaccounted ones using this specified swap_ratio. || | ||
\\ | \\ | ||
Line 78: | Line 84: | ||
1. Simulated ribosome. | 1. Simulated ribosome. | ||
\\ | \\ | ||
- | 14400 particles with 64*64 image size belong to five even groups (all have 2880 members). | + | 17, |
\\ | \\ | ||
- | == The sorting results == | + | == The sorting results |
|| **Group ID** || **Particles** || **% of True** || | || **Group ID** || **Particles** || **% of True** || | ||
- | || group 1 || 2448 || 98% are true members || | + | || group 1 || 3303 || 97% are true members || |
- | || group 2 || 2493 || 98% are true members || | + | || group 2 || 3447 || 98% are true members || |
- | || group 3 || 2806 || 98% are true members || | + | || group 3 || 3229 || 98% are true members || |
- | || group 4 || 2883 || 98% are true members || | + | || group 4 || 3454 || 99% are true members || |
- | || group 5 || 2891 || 98% are true members || | + | || group 5 || 3442 || 99% are true members || |
2. Ribosome EMPIAR-10028: | 2. Ribosome EMPIAR-10028: | ||
\\ | \\ | ||
- | 105,247 particles with image size 360*360 with K=5. It took about 13 hours using 96 CPUs of our cluster, which is about twice the time it took to refine this set. The command for this run is given in case 1. We were able to sort out missing helix and missing domain. (See the attached movie and figure). | + | 105,247 particles with image size 360*360 with K=5. It took about 7 hours using 96 CPUs of our cluster, which is about twice the time it took to refine this set. The command for this run is given in case 1. We were able to sort out missing helix and missing domain. (See the attached movie and figure). |
\\ | \\ | ||
Line 106: | Line 113: | ||
Important Outputs: | Important Outputs: | ||
- | The results are saved in the directory specified as output_dir | + | The results are saved in the directory specified as output_dir |
* Cluster*.txt: | * Cluster*.txt: | ||
* vol_cluster*.hdf: | * vol_cluster*.hdf: | ||
* anova on defocus, number of smears, norm and statistics of micrographs of the final clusters and clusters produced in each generation are documented in log.txt. | * anova on defocus, number of smears, norm and statistics of micrographs of the final clusters and clusters produced in each generation are documented in log.txt. | ||
- | * sorting_summary.txt: | + | * Core_set.txt: Particles indexes that are not selected when sort3d program is done. |
- | * vol_cluster*_iter000.hdf, Cluster*.txt in each generation_00? | + | * volume_core.hdf: Map reconstructed from Core_set.txt. |
Some examples for timing: | Some examples for timing: | ||
- | In general, reconstruction costs more than 80% of time for each sorting. | + | In general, reconstruction costs more than 80% of time for each sorting. Activate do_timing variable in set_sorting_global_variables_mpi and the program prints timing of major steps. |
\\ | \\ | ||
Line 127: | Line 134: | ||
\\ | \\ | ||
==== Files ==== | ==== Files ==== | ||
- | sparx/bin/sxsort3d_depth.py | + | sparx/bin/sp_sort3d_depth.py |
\\ | \\ |