3D Clustering - SORT3D: Sort 3D heterogeneity based on the reproducible members of K-means and Equal K-means classification. It runs after 3D refinement where the alignment parameters are determined.
Usage in command line
sp_sort3d.py stack outdir mask --focus=3Dmask --radius=outer_radius --delta=angular_step --CTF --sym=c1 --number_of_images_per_group=number_of_images_per_group --nxinit=nxinit --smallest_group=smallest_group --chunk0=CHUNK0_FILE_NAME --chunk1=CHUNK1_FILE_NAME --ir=inner_radius --maxit=max_iter --rs=ring_step --xr=xr --yr=yr --ts=ts --an=angular_neighborhood --center=centring_method --nassign=nassign --nrefine=nrefine --stoprnct=stop_percent --function=user_function --independent=indenpendent_runs --low_pass_filter=low_pass_filter --unaccounted --seed=random_seed --sausage --PWadjustment=PWadjustment --protein_shape=protein_shape --upscale=upscale --wn=wn --interpolation=method
sp_sort3d exists only in MPI version.
mpirun -np 192 sp_sort3d.py bdb:data sort3d_outdir1 mask.hdf --focus=ribosome_focus.hdf --chunkdir=/data/n10/pawel/ribosome_frank/ri3/main013 --radius=52 --CTF --number_of_images_per_group=2000 --low_pass_filter=.125 --stoprnct=5
The clustering algorithm in the program combines a couple of computational techniques, equal-Kmeans clustering, K-means clustering, and reproducibility of clustering such that it not only has a strong ability but also a high efficiency to sort out heterogeneity of cryo-EM images. The command sp_sort3d.py is the protocol I (P1). In this protocol, the user defines the group size and thus defines the number of group K. Then the total data is randomly assigned into K group and an equal-size K-means (size restricted K-means) is carried out. N independent equal-Kmeans runs would give N partition of the K groups assignment. Then, two-way comparison of these partitions gives the reproducible number of particles.
Described by A.Einstein in his first paper on spectrum of radiation from a house heater kept at room temperature. Journal of Irreproducible Results, 12, 1905, 12-1127.
Zhong Huang
Category 1:: APPLICATIONS
sparx/bin/sp_sort3d.py
Alpha:: Under development. Two programs (P1,P2) have been tested on both simulated and experimental ribosome data. For experimental ribosome data, P2 has a reproducible ratio-70-90%. P2 can 100%separate two conformations from the simulated ribosome data that contains 5 conformations.
There are no known bugs so far.