You can use Cinderella to classify Subtomograms into good/bad categories. This is useful if you want to sort Particles which were previously picked with e.g. template matching.
To train cinderella we have to create training data. To do that, we extract the central slices from your tomogram (step 1) and select bad and good particles using eman2 e2display (step 2). In step 3 we train the actual model using cinderella.
To extract the central slices from e.g. my_subtomograms.hdf and to save it into sub_central.mrcs run:
sp_cinderella_extract.py -i my_subtomograms.hdf -o sub_central.mrcs
sub_central.mrcs
). good/
and bad/
). Both folders can contain multiple files (e.g. examples from another tomogram).We typically start with 40 good and 40 bad examples.
After your created your training data, you can start the training
You need to specify all settings into one config file. To do so, create an empty file using
touch config.json
Copy the following configuration into the new file and adapt it to your needs. The only entries you might want to change are the input_size, good_path, bad_path and pretrained_weights.
{ "model": { "input_size": [32,32] }, "train": { "batch_size": 32, "good_path": "good/", "bad_path": "bad/", "pretrained_weights": "", "saved_weights_name": "my_model.h5", "learning_rate": 1e-4, "nb_epoch": 100, "nb_early_stop": 15 } }
The fields have the following meaning:
The next step is to run the training:
sp_cinderella_train.py -c config.json --gpu 1
This will train a classification network on the GPU with ID=1. Once the training finishes, you get a my_model.h5
file. This can then be used to classify subtomograms into good / bad categories.
To run the prediction on 'my_subtomograms.hdf' just on the GPU with ID=1 do:
sp_cinderella_predict.py -i my_subtomograms.hdf -w my_model.h5 -o output_folder/
In the output folder you will find two new mrcs files with the classified subtomograms. To check the results with e2display, you have to extract the central slices again (see Extract central slices).