User Tools

Site Tools


pipeline:sort3d:sxrsort3d

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
pipeline:sort3d:sxrsort3d [2018/02/23 16:55]
moriya
pipeline:sort3d:sxrsort3d [2018/02/23 18:11]
moriya
Line 1: Line 1:
 ~~NOTOC~~ ~~NOTOC~~
  
-===== sxsort3d ===== +===== sxrsort3d ===== 
-3D Clustering - SORT3D: Sort 3D heterogeneity based on the reproducible members of K-means and Equal K-means classification. It runs after 3D refinement where the alignment parameters are determined.+3D Clustering - RSORT3D: Sort out 3D heterogeneity of 2D data whose 3D reconstruction parameters (xform.projection) have been determined already using 3D sorting.
  
 \\ \\
Line 9: Line 9:
 Usage in command line Usage in command line
  
-  sxsort3d.py  stack  outdir  mask  --focus=3Dmask  --radius=outer_radius  --delta=angular_step  --CTF  --sym=c1  --number_of_images_per_group=number_of_images_per_group  --nxinit=nxinit  --smallest_group=smallest_group  --chunk0=CHUNK0_FILE_NAME  --chunk1=CHUNK1_FILE_NAME  --ir=inner_radius  --maxit=max_iter  --rs=ring_step  --xr=xr  --yr=yr  --ts=ts  --an=angular_neighborhood  --center=centring_method  --nassign=nassign  --nrefine=nrefine  --stoprnct=stop_percent  --function=user_function  --independent=indenpendent_runs  --low_pass_filter=low_pass_filter  --unaccounted  --seed=random_seed  --sausage  --PWadjustment=PWadjustment  --protein_shape=protein_shape  --upscale=upscale  --wn=wn  --interpolation=method+  sxrsort3d.py  stack  outdir  mask  --previous_run1=sort3d_run1_directory  --previous_run2=sort3d_run2_directory  --focus=3D_focus_mask  --radius=outer_radius  --delta=angular_step  --CTF  --sym=symmetry  --number_of_images_per_group=images_per_group  --nxinit=nxinit  --smallest_group=smallest_group  --chunkdir=chunkdir  --ir=inner_radius  --maxit=max_iter  --rs=ring_step  --xr=xr  --yr=yr  --ts=ts  --an=angular_neighborhood  --center=centring_method  --nassign=nassign  --nrefine=nrefine  --stoprnct=stop_percent  --function=user_function  --independent=independent_runs  --low_pass_filter=low_pass_filter  --unaccounted  --seed=random_seed  --group_size_for_unaccounted=group_size_for_unaccounted  --sausage  --PWadjustment=PWadjustment  --protein_shape=protein_shape  --upscale=upscale  --wn=wn  --interpolation=method
  
 \\ \\
 ===== Typical usage ===== ===== Typical usage =====
  
-sxsort3d exists only in MPI version.+sxrsort3d.py exists only in MPI version.
  
-  mpirun -np 192 sxsort3d.py bdb:data sort3d_outdir1 mask.hdf --focus=ribosome_focus.hdf --chunkdir=/data/n10/pawel/ribosome_frank/ri3/main013 --radius=52 --CTF --number_of_images_per_group=2000 --low_pass_filter=.125 --stoprnct=5+  mpirun -np 192 sxrsort3d.py bdb:data rsort3d mask.hdf --previous_run1=sort3d1 --previous_run2=sort3d2 --radius=88 --maxit=25 --independent=3 --CTF --number_of_images_per_group=11000 --low_pass_filter=.20 --chunkdir=./ --sym=c4 --PWadjustment=pwrec.txt
  
 \\ \\
Line 23: Line 23:
   ; stack : Input images stack: (default required string)   ; stack : Input images stack: (default required string)
   ; outdir : Output directory: There is a log.txt that describes the sequences of computations in the program. (default required string)   ; outdir : Output directory: There is a log.txt that describes the sequences of computations in the program. (default required string)
-  ; mask : 3D mask: File path of the global 3D mask for clustering. (default none)+  ; mask : 3D mask: (default none)
  
-  ; focus : Binary Focus 3D mask: Binary 3D mask used for focused clustering(default none) +  ; %%--%%previous_run1 : Directory of 1st sort3d run: (default required string) 
-  ; radius : Particle radius [Pixels]: Used as outer radius for rotational correlation.  Must be smaller than half the box size. (default -1) +  ; %%--%%previous_run2 : Directory of 2nd sort3d run: (default required string) 
-  ; delta : Angular step for projections [Degrees]Angular step of reference projections. (default '2'+  ; %%--%%focus : Focus 3D mask: Mask used for focused clustering (default none) 
-  ; CTF : Use CTF: Do a full CTF correction during the alignment. (default False) +  ; %%--%%radius : Outer radius for rotational correlation [Pixels]: Must be smaller than half the box size. Please set to the radius of the particle. (default -1) 
-  ; sym : Point-group symmetry: Point-group symmetry of the target structure. (default c1) +  ; %%--%%delta : Angular step for projections: (default '2'
-  ; number_of_images_per_group : Images per group: Critical value defined by user. It suggests program the number of groups. (default 1000) +  ; %%--%%CTF : Use CTF: Do a full CTF correction during the alignment. (default False)  
-  ; nxinit : Initial image size for sorting: Initial image size for sorting. (default 64) +  ; %%--%%sym : Point-group symmetry: (default c1)  
-  ; smallest_group : Smallest group size: Minimum members for identified group. (default 500)  +  ; %%--%%number_of_images_per_group : Images per group: Critical number of images per group, defined by user. (default 1000)  
-  ; chunk0 Chunk file name for 1st halfsetName of chunk file containing particle IDs of 1st halfset (chunk0) for computing margin of error(default none) +  ; %%--%%nxinit : Initial image size for sorting: (default 64) 
-  ; chunk1 : Chunk file name for 2nd halfset: Name of chunk file containing particle IDs of 2nd halfset (chunk1) for computing margin of error. (default none)+  ; %%--%%smallest_group : Smallest group size: Minimum members for identified group. (default 500)  
 +  ; %%--%%chunkdir Directory containing chunk filesTypically, specify meridien output directory. A chunk file contains a list of particle IDs belonging to associated halfset. This information is used for computing margin of error. (default none)
  
 \\ \\
 === Advanced Parameters === === Advanced Parameters ===
-  ; ir : Inner radius for rotational correlation [Pixels]: Must be bigger than 1. (default 1) +  ; %%--%%ir : Inner radius for rotational correlation [Pixels]: Must be bigger than 1. (default 1) 
-  ; maxit : Maximum iterations: Maximum number of iteration. (default 25+  ; %%--%%maxit : Maximum iterations: (default 50
-  ; rs : Step between rings in rotational correlation: Must be bigger than 0. (default 1) +  ; %%--%%rs : Step between rings in rotational correlation: Must be bigger than 0. (default 1) 
-  ; xr : X search range [Pixels]: The translational search range in the x direction will take place in -xr to +xr range. (default '1'+  ; %%--%%xr : X search range [Pixels]: The translational search range in the x direction will take place in +/xr range. (default '1'
-  ; yr : Y search range [Pixels]: The translational search range in the y direction will take place in -yr to +yr range.. If omittedit will be set as xr. (default '-1'+  ; %%--%%yr : Y search range [Pixels]: The translational search range in the y direction. If omitted it will be set as xr. (default '-1'
-  ; ts : Translational search step [Pixels]: The search will be performed in -xr, -xr+ts, 0, xr-ts, xr, can be fractional. (default '0.25'+  ; %%--%%ts : Translational search step [Pixels]: The search will be performed in -xr, -xr+ts, 0, xr-ts, xr, can be fractional. (default '0.25'
-  ; an : Local angular search width [Degrees]: This defines the neighbourhood where the local angular search will be performed. (default '-1'+  ; %%--%%an : Local angular search width [Degrees]: This defines the neighbourhood where the local angular search will be performed. (default '-1'
-  ; center : Centering method: 0 - if you do not want the volume to be centered, 1 - center the volume using the center of gravity. (default 0) +  ; %%--%%center : Centering method: 0 - if you do not want the volume to be centered, 1 - center the volume using the center of gravity. (default 0) 
-  ; nassign : Number of reassignment iterations: Performed for each angular step. (default 1) +  ; %%--%%nassign : Number of reassignment iterations: Performed for each angular step. (default 1) 
-  ; nrefine : Number of alignment iterations: Performed for each angular step. (default 0) +  ; %%--%%nrefine : Number of alignment iterations: Performed for each angular step. (default 0) 
-  ; stoprnct : Assignment convergence threshold [%]: Used to asses convergence of the run. It is the minimum percentage of assignment change required to stop the run.  (default 3.0) +  ; %%--%%stoprnct : Assignment convergence threshold [%]: Used to asses convergence of the run. It is the minimum percentage of assignment change required to stop the run.  (default 3.0)  
-  ; function : Reference preparation function: Specify name of function used to prepare the reference volume. (default do_volume_mrk05) +  ; %%--%%function : Reference preparation function: Function used to prepare the reference volume. (default do_volume_mrk05)  
-  ; independent : Number of independent runs: Number of independent equal-Kmeans(default 3) +  ; %%--%%independent : Number of independent runs: Number of independent equal-Kmeans(default 3)  
-  ; low_pass_filter : Low-pass filter frequency [1/Pixels]: Low-pass filter used for the 3D sorting on the original image size. Specify with absolute frequency. (default -1.0) +  ; %%--%%low_pass_filter : Low-pass filter frequency [1/Pixel]: Low-pass filter used for the 3D sorting on the original image size. (default -1.0) 
-  ; unaccounted : Reconstruct unaccounted images: Reconstruct unaccounted images. (default False) +  ; %%--%%unaccounted : Reconstruct unaccounted images: (default False)  
-  ; seed : Random seed: Seed used for the initial random assignment for EQ Kmeans. The program generates a random integer by default. (default -1) +  ; %%--%%seed : Random seed: Seed used for the initial random assignment for EQ Kmeans. The program generates a random integer by default. (default -1)  
-  ; sausage : Use sausage filter: A way of filtering volume. (default False) +  ; %%--%%group_size_for_unaccounted : Unaccounted particles group size: (default 500)  
-  ; PWadjustment : Power spectrum reference: Text file containing a 1D reference power spectrum used for EM density map power spectrum correction. Typically, compute 1D power spectrum from PDB file. (default none) +  ; %%--%%sausage : Use sausage filter: (default False) 
-  ; protein_shape : Protein Shape: It defines protein preferred orientation angles. "g" is for globular proteins and "f" is for filament proteins. (default 'g'+  ; %%--%%PWadjustment : Power spectrum reference: Text file containing a 1D reference power spectrum. (default none)  
-  ; upscale : Power spectrum adjustment strength: This parameters adjusts how strongly the power spectrum of the volume should be modified to match the reference. A value of 1 brings the volume's power spectrum completely to the reference, while a value of 0 means no modification. (default 0.5) +  ; %%--%%protein_shape : Protein Shape: It defines protein preferred orientation angles. "g" is for globular proteins and "f" is for filament proteins. (default g) 
-  ; wn : Target image size [Pixels]: Specify optimal window size for data processing. If different than 0, then the images will be rescaled to fit this size. (default 0) +  ; %%--%%upscale : Power spectrum adjustment strength: This parameters adjusts how strongly the power spectrum of the volume should be modified to match the reference. A value of 1 brings the volume's power spectrum completely to the reference, while a value of 0 means no modification.  (default 0.5)  
-  ; interpolation : 3D interpolation method: Method interpolation in 3D. Options are tr1 or 4nn. (default '4nn')+  ; %%--%%wn : Target image size [Pixels]: If different than 0, then the images will be rescaled to fit this size. (default 0)  
 +  ; %%--%%interpolation : 3D interpolation method: Method interpolation in 3D. Options are tr1 or 4nn. (default 4nn)
  
 \\ \\
Line 66: Line 68:
 \\ \\
 ===== Description ===== ===== Description =====
-The clustering algorithm in the program combines a couple of computational techniques, equal-Kmeans clustering, K-means clustering, and reproducibility of clustering such that it not only has a strong ability but also a high efficiency to sort out heterogeneity of cryo-EM images. The command sxsort3d.py is the protocol I {P1). In this protocol, the user defines the group size and thus defines the number of group K. Then the total data is randomly assigned into K group and an equal-size K-means (size restricted K-means) is carried outindependent equal-Kmeans runs would give N partition of the K groups assignmentThen two-way comparison of these partitions gives the reproducible number of particles.+sxrsort3d.py finds out stable members by carrying out two-way comparison of two independent sxsort3d.py runs. 
 + 
 +For small tested datasets (real and simulated ribosome data around 10K particles), it gives 70%-90% reproducibilityHoweverthis rate also depends on the choice of number of images per group and number of particles in the smallest group
 + 
 +\\ 
 +=== Time and Memory === 
 +On lonestar cluster of TACC, using 264 cpus, it takes about 2 hours and 23 minutes to accomplish 95953 128x128 images for one sxsort3d.py independent run, 2 hours 24 minutes to accomplish one independent sxrsort3d.py run with number_of_images_per_group set as 30000
  
 \\ \\
 ==== Method ==== ==== Method ====
 +K-means, equal K-means, reproducibility, two-way comparison.
  
 \\ \\
 ==== Reference ==== ==== Reference ====
-Described by A.Einstein in his first paper on spectrum of radiation from a house heater kept at room temperature. Journal of Irreproducible Results, 12, 1905, 12-1127.+Not published yet.
  
 \\ \\
Line 81: Line 90:
 ==== Author / Maintainer ==== ==== Author / Maintainer ====
 Zhong Huang Zhong Huang
- 
  
 \\ \\
Line 89: Line 97:
 \\ \\
 ==== Files ==== ==== Files ====
-sparx/bin/sxsort3d.py+sparx/bin/sxrsort3d.py
  
 \\ \\
 ==== See also ==== ==== See also ====
-[[pipeline:utilities:sxheader|sxheader]], [[[pipeline:sort3d:sx3dvariability|sx3dvariability]], [[pipeline:sort3d:sxsort3d_depth|sxsort3d_depth]] and [[pipeline:sort3d:sxrsort3d|sxrsort3d]]+[[pipeline:meridien:sxmeridien|sxmeridien]], [[pipeline:utilities:sxheader|sxheader]], [[[pipeline:sort3d:sx3dvariability|sx3dvariability]], [[pipeline:sort3d:sxsort3d|sxsort3d]] and [[pipeline:sort3d:sxsort3d_depth|sxsort3d_depth]]
  
 \\ \\
 ==== Maturity ==== ==== Maturity ====
-Alpha:: Under development. Two programs (P1,P2) have been tested on both simulated and experimental ribosome data. For experimental ribosome data, P2 has a reproducible ratio-70-90%. P2 can 100%separate two conformations from the simulated ribosome data that contains 5 conformations+Alpha:: Under development.
  
 \\ \\
pipeline/sort3d/sxrsort3d.txt · Last modified: 2018/06/20 13:12 (external edit)