User Tools

Site Tools


pipeline:sort3d:sxrsort3d

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
pipeline:sort3d:sxrsort3d [2018/02/23 16:55]
moriya
pipeline:sort3d:sxrsort3d [2018/02/23 18:12]
moriya
Line 1: Line 1:
 ~~NOTOC~~ ~~NOTOC~~
  
-===== sxsort3d ===== +===== sxrsort3d ===== 
-3D Clustering - SORT3D: Sort 3D heterogeneity based on the reproducible members of K-means and Equal K-means classification. It runs after 3D refinement where the alignment parameters are determined.+3D Clustering - RSORT3D: Sort out 3D heterogeneity of 2D data whose 3D reconstruction parameters (xform.projection) have been determined already using 3D sorting.
  
 \\ \\
Line 9: Line 9:
 Usage in command line Usage in command line
  
-  sxsort3d.py  stack  outdir  mask  --focus=3Dmask  --radius=outer_radius  --delta=angular_step  --CTF  --sym=c1  --number_of_images_per_group=number_of_images_per_group  --nxinit=nxinit  --smallest_group=smallest_group  --chunk0=CHUNK0_FILE_NAME  --chunk1=CHUNK1_FILE_NAME  --ir=inner_radius  --maxit=max_iter  --rs=ring_step  --xr=xr  --yr=yr  --ts=ts  --an=angular_neighborhood  --center=centring_method  --nassign=nassign  --nrefine=nrefine  --stoprnct=stop_percent  --function=user_function  --independent=indenpendent_runs  --low_pass_filter=low_pass_filter  --unaccounted  --seed=random_seed  --sausage  --PWadjustment=PWadjustment  --protein_shape=protein_shape  --upscale=upscale  --wn=wn  --interpolation=method+  sxrsort3d.py  stack  outdir  mask  --previous_run1=sort3d_run1_directory  --previous_run2=sort3d_run2_directory  --focus=3D_focus_mask  --radius=outer_radius  --delta=angular_step  --CTF  --sym=symmetry  --number_of_images_per_group=images_per_group  --nxinit=nxinit  --smallest_group=smallest_group  --chunkdir=chunkdir  --ir=inner_radius  --maxit=max_iter  --rs=ring_step  --xr=xr  --yr=yr  --ts=ts  --an=angular_neighborhood  --center=centring_method  --nassign=nassign  --nrefine=nrefine  --stoprnct=stop_percent  --function=user_function  --independent=independent_runs  --low_pass_filter=low_pass_filter  --unaccounted  --seed=random_seed  --group_size_for_unaccounted=group_size_for_unaccounted  --sausage  --PWadjustment=PWadjustment  --protein_shape=protein_shape  --upscale=upscale  --wn=wn  --interpolation=method
  
 \\ \\
 ===== Typical usage ===== ===== Typical usage =====
  
-sxsort3d exists only in MPI version.+sxrsort3d.py exists only in MPI version.
  
-  mpirun -np 192 sxsort3d.py bdb:data sort3d_outdir1 mask.hdf --focus=ribosome_focus.hdf --chunkdir=/data/n10/pawel/ribosome_frank/ri3/main013 --radius=52 --CTF --number_of_images_per_group=2000 --low_pass_filter=.125 --stoprnct=5+  mpirun -np 192 sxrsort3d.py bdb:data rsort3d mask.hdf --previous_run1=sort3d1 --previous_run2=sort3d2 --radius=88 --maxit=25 --independent=3 --CTF --number_of_images_per_group=11000 --low_pass_filter=.20 --chunkdir=./ --sym=c4 --PWadjustment=pwrec.txt
  
 \\ \\
Line 23: Line 23:
   ; stack : Input images stack: (default required string)   ; stack : Input images stack: (default required string)
   ; outdir : Output directory: There is a log.txt that describes the sequences of computations in the program. (default required string)   ; outdir : Output directory: There is a log.txt that describes the sequences of computations in the program. (default required string)
-  ; mask : 3D mask: File path of the global 3D mask for clustering. (default none)+  ; mask : 3D mask: (default none)
  
-  ; focus : Binary Focus 3D mask: Binary 3D mask used for focused clustering(default none) +  ; %%--%%previous_run1 : Directory of 1st sort3d run: (default required string) 
-  ; radius : Particle radius [Pixels]: Used as outer radius for rotational correlation.  Must be smaller than half the box size. (default -1) +  ; %%--%%previous_run2 : Directory of 2nd sort3d run: (default required string) 
-  ; delta : Angular step for projections [Degrees]Angular step of reference projections. (default '2'+  ; %%--%%focus : Focus 3D mask: Mask used for focused clustering (default none) 
-  ; CTF : Use CTF: Do a full CTF correction during the alignment. (default False) +  ; %%--%%radius : Outer radius for rotational correlation [Pixels]: Must be smaller than half the box size. Please set to the radius of the particle. (default -1) 
-  ; sym : Point-group symmetry: Point-group symmetry of the target structure. (default c1) +  ; %%--%%delta : Angular step for projections: (default '2'
-  ; number_of_images_per_group : Images per group: Critical value defined by user. It suggests program the number of groups. (default 1000) +  ; %%--%%CTF : Use CTF: Do a full CTF correction during the alignment. (default False)  
-  ; nxinit : Initial image size for sorting: Initial image size for sorting. (default 64) +  ; %%--%%sym : Point-group symmetry: (default c1)  
-  ; smallest_group : Smallest group size: Minimum members for identified group. (default 500)  +  ; %%--%%number_of_images_per_group : Images per group: Critical number of images per group, defined by user. (default 1000)  
-  ; chunk0 Chunk file name for 1st halfsetName of chunk file containing particle IDs of 1st halfset (chunk0) for computing margin of error(default none) +  ; %%--%%nxinit : Initial image size for sorting: (default 64) 
-  ; chunk1 : Chunk file name for 2nd halfset: Name of chunk file containing particle IDs of 2nd halfset (chunk1) for computing margin of error. (default none)+  ; %%--%%smallest_group : Smallest group size: Minimum members for identified group. (default 500)  
 +  ; %%--%%chunkdir Directory containing chunk filesTypically, specify meridien output directory. A chunk file contains a list of particle IDs belonging to associated halfset. This information is used for computing margin of error. (default none)
  
 \\ \\
 === Advanced Parameters === === Advanced Parameters ===
-  ; ir : Inner radius for rotational correlation [Pixels]: Must be bigger than 1. (default 1) +  ; %%--%%ir : Inner radius for rotational correlation [Pixels]: Must be bigger than 1. (default 1) 
-  ; maxit : Maximum iterations: Maximum number of iteration. (default 25+  ; %%--%%maxit : Maximum iterations: (default 50
-  ; rs : Step between rings in rotational correlation: Must be bigger than 0. (default 1) +  ; %%--%%rs : Step between rings in rotational correlation: Must be bigger than 0. (default 1) 
-  ; xr : X search range [Pixels]: The translational search range in the x direction will take place in -xr to +xr range. (default '1'+  ; %%--%%xr : X search range [Pixels]: The translational search range in the x direction will take place in +/xr range. (default '1'
-  ; yr : Y search range [Pixels]: The translational search range in the y direction will take place in -yr to +yr range.. If omittedit will be set as xr. (default '-1'+  ; %%--%%yr : Y search range [Pixels]: The translational search range in the y direction. If omitted it will be set as xr. (default '-1'
-  ; ts : Translational search step [Pixels]: The search will be performed in -xr, -xr+ts, 0, xr-ts, xr, can be fractional. (default '0.25'+  ; %%--%%ts : Translational search step [Pixels]: The search will be performed in -xr, -xr+ts, 0, xr-ts, xr, can be fractional. (default '0.25'
-  ; an : Local angular search width [Degrees]: This defines the neighbourhood where the local angular search will be performed. (default '-1'+  ; %%--%%an : Local angular search width [Degrees]: This defines the neighbourhood where the local angular search will be performed. (default '-1'
-  ; center : Centering method: 0 - if you do not want the volume to be centered, 1 - center the volume using the center of gravity. (default 0) +  ; %%--%%center : Centering method: 0 - if you do not want the volume to be centered, 1 - center the volume using the center of gravity. (default 0) 
-  ; nassign : Number of reassignment iterations: Performed for each angular step. (default 1) +  ; %%--%%nassign : Number of reassignment iterations: Performed for each angular step. (default 1) 
-  ; nrefine : Number of alignment iterations: Performed for each angular step. (default 0) +  ; %%--%%nrefine : Number of alignment iterations: Performed for each angular step. (default 0) 
-  ; stoprnct : Assignment convergence threshold [%]: Used to asses convergence of the run. It is the minimum percentage of assignment change required to stop the run.  (default 3.0) +  ; %%--%%stoprnct : Assignment convergence threshold [%]: Used to asses convergence of the run. It is the minimum percentage of assignment change required to stop the run.  (default 3.0)  
-  ; function : Reference preparation function: Specify name of function used to prepare the reference volume. (default do_volume_mrk05) +  ; %%--%%function : Reference preparation function: Function used to prepare the reference volume. (default do_volume_mrk05)  
-  ; independent : Number of independent runs: Number of independent equal-Kmeans(default 3) +  ; %%--%%independent : Number of independent runs: Number of independent equal-Kmeans(default 3)  
-  ; low_pass_filter : Low-pass filter frequency [1/Pixels]: Low-pass filter used for the 3D sorting on the original image size. Specify with absolute frequency. (default -1.0) +  ; %%--%%low_pass_filter : Low-pass filter frequency [1/Pixel]: Low-pass filter used for the 3D sorting on the original image size. (default -1.0) 
-  ; unaccounted : Reconstruct unaccounted images: Reconstruct unaccounted images. (default False) +  ; %%--%%unaccounted : Reconstruct unaccounted images: (default False)  
-  ; seed : Random seed: Seed used for the initial random assignment for EQ Kmeans. The program generates a random integer by default. (default -1) +  ; %%--%%seed : Random seed: Seed used for the initial random assignment for EQ Kmeans. The program generates a random integer by default. (default -1)  
-  ; sausage : Use sausage filter: A way of filtering volume. (default False) +  ; %%--%%group_size_for_unaccounted : Unaccounted particles group size: (default 500)  
-  ; PWadjustment : Power spectrum reference: Text file containing a 1D reference power spectrum used for EM density map power spectrum correction. Typically, compute 1D power spectrum from PDB file. (default none) +  ; %%--%%sausage : Use sausage filter: (default False) 
-  ; protein_shape : Protein Shape: It defines protein preferred orientation angles. "g" is for globular proteins and "f" is for filament proteins. (default 'g'+  ; %%--%%PWadjustment : Power spectrum reference: Text file containing a 1D reference power spectrum. (default none)  
-  ; upscale : Power spectrum adjustment strength: This parameters adjusts how strongly the power spectrum of the volume should be modified to match the reference. A value of 1 brings the volume's power spectrum completely to the reference, while a value of 0 means no modification. (default 0.5) +  ; %%--%%protein_shape : Protein Shape: It defines protein preferred orientation angles. "g" is for globular proteins and "f" is for filament proteins. (default g) 
-  ; wn : Target image size [Pixels]: Specify optimal window size for data processing. If different than 0, then the images will be rescaled to fit this size. (default 0) +  ; %%--%%upscale : Power spectrum adjustment strength: This parameters adjusts how strongly the power spectrum of the volume should be modified to match the reference. A value of 1 brings the volume's power spectrum completely to the reference, while a value of 0 means no modification.  (default 0.5)  
-  ; interpolation : 3D interpolation method: Method interpolation in 3D. Options are tr1 or 4nn. (default '4nn')+  ; %%--%%wn : Target image size [Pixels]: If different than 0, then the images will be rescaled to fit this size. (default 0)  
 +  ; %%--%%interpolation : 3D interpolation method: Method interpolation in 3D. Options are tr1 or 4nn. (default 4nn)
  
 \\ \\
Line 66: Line 68:
 \\ \\
 ===== Description ===== ===== Description =====
-The clustering algorithm in the program combines a couple of computational techniques, equal-Kmeans clustering, K-means clustering, and reproducibility of clustering such that it not only has a strong ability but also a high efficiency to sort out heterogeneity of cryo-EM images. The command sxsort3d.py is the protocol I {P1). In this protocol, the user defines the group size and thus defines the number of group K. Then the total data is randomly assigned into K group and an equal-size K-means (size restricted K-means) is carried outindependent equal-Kmeans runs would give N partition of the K groups assignmentThen two-way comparison of these partitions gives the reproducible number of particles.+sxrsort3d.py finds out stable members by carrying out two-way comparison of two independent sxsort3d.py runs. 
 + 
 +For small tested datasets (real and simulated ribosome data around 10K particles), it gives 70%-90% reproducibilityHoweverthis rate also depends on the choice of number of images per group and number of particles in the smallest group
 + 
 +\\ 
 +=== Time and Memory === 
 +On lonestar cluster of TACC, using 264 cpus, it takes about 2 hours and 23 minutes to accomplish 95953 128x128 images for one sxsort3d.py independent run, 2 hours 24 minutes to accomplish one independent sxrsort3d.py run with number_of_images_per_group set as 30000
  
 \\ \\
 ==== Method ==== ==== Method ====
 +K-means, equal K-means, reproducibility, two-way comparison.
  
 \\ \\
 ==== Reference ==== ==== Reference ====
-Described by A.Einstein in his first paper on spectrum of radiation from a house heater kept at room temperature. Journal of Irreproducible Results, 12, 1905, 12-1127.+Not published yet.
  
 \\ \\
Line 81: Line 90:
 ==== Author / Maintainer ==== ==== Author / Maintainer ====
 Zhong Huang Zhong Huang
- 
  
 \\ \\
Line 89: Line 97:
 \\ \\
 ==== Files ==== ==== Files ====
-sparx/bin/sxsort3d.py+sparx/bin/sxrsort3d.py
  
 \\ \\
 ==== See also ==== ==== See also ====
-[[pipeline:utilities:sxheader|sxheader]], [[[pipeline:sort3d:sx3dvariability|sx3dvariability]], [[pipeline:sort3d:sxsort3d_depth|sxsort3d_depth]] and [[pipeline:sort3d:sxrsort3d|sxrsort3d]]+[[pipeline:meridien:sxmeridien|sxmeridien]], [[pipeline:utilities:sxheader|sxheader]], [[[pipeline:sort3d:sx3dvariability|sx3dvariability]], [[pipeline:sort3d:sxsort3d|sxsort3d]]and [[pipeline:sort3d:sxsort3d_depth|sxsort3d_depth]].
  
 \\ \\
 ==== Maturity ==== ==== Maturity ====
-Alpha:: Under development. Two programs (P1,P2) have been tested on both simulated and experimental ribosome data. For experimental ribosome data, P2 has a reproducible ratio-70-90%. P2 can 100%separate two conformations from the simulated ribosome data that contains 5 conformations+Alpha:: Under development.
  
 \\ \\
pipeline/sort3d/sxrsort3d.txt · Last modified: 2018/06/20 13:12 (external edit)