User Tools

Site Tools


cinderella_tomograms
This version is outdated by a newer approved version.DiffThis version (2019/12/13 15:40) was approved by twagner.The Previously approved version (2019/12/13 15:13) is available.Diff

This is an old revision of the document!


How to use SPHIRE's Cinderella for subtomogram selection

You can use Cinderella to classify Subtomograms into good/bad categories. This is useful if you want to sort Particles which were previously picked with e.g. template matching.

Training

To train cinderella we have to create training data. To do that, we extract the central slices from your tomogram (step 1) and select bad and good particles with eman2 e2display (step 2). In step 3 we train the actual model.

1. Extract central slices

sp_cinderella_extract.py -i my_subtomograms.hdf -o sub_central.mrcs

2. Select good / bad training examples with e2display

  1. Start e2display from eman2 and select the central slice file (in our example sub_central.mrcs).
  2. Press ►[Show Stack] to display the file.
  3. Now click with the central mouse button (mouse wheel) on any particle. In new dialog press the button ►[Sets] and select the tab “Sets”.
  4. There should be already a class “bad_particles”. Create another class with and call it “good_particles”. Highlight the set to which you want to add particle.
  5. If you now click on particles in the overview, they will be added to the current selected set.
  6. After you finished the selection, press ►[Save] for each selected class. You should save the classes into separate folders (e.g. good/ and bad/). Both folder can contain multiple files (e.g. examples from another tomogram).

How many examples do I need for training?

We typically start with 40 good and 40 bad examples.

3. Start training

After your created your training data, you can start the training :-)

You need to specifiy all settings into one config file. To that, create an empty file with

touch config.json

Copy the following configuration into it and adapt it for your needs. The only entries you might want to change is the input_size, good_path, bad_path and pretrained_weights.

config.json
{
	"model": {
		"input_size": [32,32]
	},
 
	"train": {
		"batch_size": 32,
		"good_path": "good/",
		"bad_path": "bad/",
		"pretrained_weights": "",
		"saved_weights_name": "my_model.h5",
		"learning_rate": 1e-4,
		"nb_epoch": 100,
		"nb_early_stop": 15
	}
}

The fields have the following meaning:

  • input_size: This is the image size to which each central slice is resized to.
  • batch_size: How many images are in one mini-batch. If you have memory problems, you can try to reduce this value.
  • good_path: Path to folder with good central slices..
  • bad_path: Path to folder with bad central slices.
  • pretrained_weights: Path to weights that are used to initialize the network. It can be empty. As Cinderella is using the same network architecture as crYOLO, we are typically using the general network of crYOLO as pretrained weights.
  • saved_weights_name: Final model filename
  • learning_rate: Learning rate, should not be changed.
  • nb_epoch: Maximum number of epochs to train. However, it will stop earlier (see nb_early_stop).
  • nb_early_stop: If the validation loss did not improve “nb_early_stop” times in a row, the training will stop automatically.

The next step is to run the training:

sp_cinderella_train.py -c config.json --gpu 1

This will train a classification network on the GPU with ID=1. After the training finishes, you get a my_model.h5 file. This can then be used to classify subtomograms into good / bad categories.

Prediction

To run the prediction on 'my_subtomograms.hdf' just on the GPU with ID=1 do:

sp_cinderella_predict.py -i my_subtomograms.hdf -w my_model.h5 -o output_folder/

You will find two new mrcs files with the classified subtomograms. To check the results with e2display, you have to extract the central slices again (see Extract central slices).

cinderella_tomograms.1576246641.txt.gz · Last modified: 2019/12/13 15:17 by twagner