This version (2018/02/23 18:12) is a draft.
Approvals: 0/1

3D Clustering - RSORT3D: Sort out 3D heterogeneity of 2D data whose 3D reconstruction parameters (xform.projection) have been determined already using 3D sorting.

Usage in command line

sxrsort3d.py  stack  outdir  mask  --previous_run1=sort3d_run1_directory  --previous_run2=sort3d_run2_directory  --focus=3D_focus_mask  --radius=outer_radius  --delta=angular_step  --CTF  --sym=symmetry  --number_of_images_per_group=images_per_group  --nxinit=nxinit  --smallest_group=smallest_group  --chunkdir=chunkdir  --ir=inner_radius  --maxit=max_iter  --rs=ring_step  --xr=xr  --yr=yr  --ts=ts  --an=angular_neighborhood  --center=centring_method  --nassign=nassign  --nrefine=nrefine  --stoprnct=stop_percent  --function=user_function  --independent=independent_runs  --low_pass_filter=low_pass_filter  --unaccounted  --seed=random_seed  --group_size_for_unaccounted=group_size_for_unaccounted  --sausage  --PWadjustment=PWadjustment  --protein_shape=protein_shape  --upscale=upscale  --wn=wn  --interpolation=method

sxrsort3d.py exists only in MPI version.

mpirun -np 192 sxrsort3d.py bdb:data rsort3d mask.hdf --previous_run1=sort3d1 --previous_run2=sort3d2 --radius=88 --maxit=25 --independent=3 --CTF --number_of_images_per_group=11000 --low_pass_filter=.20 --chunkdir=./ --sym=c4 --PWadjustment=pwrec.txt

Main Parameters

Input images stack: (default required string)
Output directory: There is a log.txt that describes the sequences of computations in the program. (default required string)
3D mask: (default none)
Directory of 1st sort3d run: (default required string)
Directory of 2nd sort3d run: (default required string)
Focus 3D mask: Mask used for focused clustering (default none)
Outer radius for rotational correlation [Pixels]: Must be smaller than half the box size. Please set to the radius of the particle. (default -1)
Angular step for projections: (default '2')
Use CTF: Do a full CTF correction during the alignment. (default False)
Point-group symmetry: (default c1)
Images per group: Critical number of images per group, defined by user. (default 1000)
Initial image size for sorting: (default 64)
Smallest group size: Minimum members for identified group. (default 500)
Directory containing chunk files: Typically, specify meridien output directory. A chunk file contains a list of particle IDs belonging to associated halfset. This information is used for computing margin of error. (default none)

Advanced Parameters

Inner radius for rotational correlation [Pixels]: Must be bigger than 1. (default 1)
Maximum iterations: (default 50)
Step between rings in rotational correlation: Must be bigger than 0. (default 1)
X search range [Pixels]: The translational search range in the x direction will take place in a +/xr range. (default '1')
Y search range [Pixels]: The translational search range in the y direction. If omitted it will be set as xr. (default '-1')
Translational search step [Pixels]: The search will be performed in -xr, -xr+ts, 0, xr-ts, xr, can be fractional. (default '0.25')
Local angular search width [Degrees]: This defines the neighbourhood where the local angular search will be performed. (default '-1')
Centering method: 0 - if you do not want the volume to be centered, 1 - center the volume using the center of gravity. (default 0)
Number of reassignment iterations: Performed for each angular step. (default 1)
Number of alignment iterations: Performed for each angular step. (default 0)
Assignment convergence threshold [%]: Used to asses convergence of the run. It is the minimum percentage of assignment change required to stop the run. (default 3.0)
Reference preparation function: Function used to prepare the reference volume. (default do_volume_mrk05)
Number of independent runs: Number of independent equal-Kmeans(default 3)
Low-pass filter frequency [1/Pixel]: Low-pass filter used for the 3D sorting on the original image size. (default -1.0)
Reconstruct unaccounted images: (default False)
Random seed: Seed used for the initial random assignment for EQ Kmeans. The program generates a random integer by default. (default -1)
Unaccounted particles group size: (default 500)
Use sausage filter: (default False)
Power spectrum reference: Text file containing a 1D reference power spectrum. (default none)
Protein Shape: It defines protein preferred orientation angles. ā€œgā€ is for globular proteins and ā€œfā€ is for filament proteins. (default g)
Power spectrum adjustment strength: This parameters adjusts how strongly the power spectrum of the volume should be modified to match the reference. A value of 1 brings the volume's power spectrum completely to the reference, while a value of 0 means no modification. (default 0.5)
Target image size [Pixels]: If different than 0, then the images will be rescaled to fit this size. (default 0)
3D interpolation method: Method interpolation in 3D. Options are tr1 or 4nn. (default 4nn)

sxrsort3d.py finds out stable members by carrying out two-way comparison of two independent sxsort3d.py runs.

For small tested datasets (real and simulated ribosome data around 10K particles), it gives 70%-90% reproducibility. However, this rate also depends on the choice of number of images per group and number of particles in the smallest group.

Time and Memory

On lonestar cluster of TACC, using 264 cpus, it takes about 2 hours and 23 minutes to accomplish 95953 128×128 images for one sxsort3d.py independent run, 2 hours 24 minutes to accomplish one independent sxrsort3d.py run with number_of_images_per_group set as 30000.

K-means, equal K-means, reproducibility, two-way comparison.

Not published yet.

Zhong Huang



Alpha:: Under development.

There are no known bugs so far.

  • pipeline/sort3d/sxrsort3d.txt
  • Last modified: 2018/06/20 13:12
  • (external edit)