How to use SPHIRE's Cinderella for micrograph selection

How to use SPHIRE's Cinderella for micrograph selection

This tutorial describes how to use Cinderella to sort micrographs. Unfortunately, we cannot provide a pretrained model yet. Therefore the first step is to train a model (see section Training) and to apply a model (see section Classify).

Download & Install

You can find the download and installation instructions here: Download and Installation

Training

The first step is to train Cinderella with manually selected good and bad micrographs. Create two folders, one containing manually selected good micrographs (e.g GOOD_MICS/) and one contain bad micrographs (e.g BAD_MICS/). Both folders can contain subfolders.

How many micrographs do I need?

We typically start with 30 good and 30 bad micrographs.

Then specify the paths into a config file like this:

config.json

{
	"model": {
		"input_size": [512,512]
	},
 
	"train": {
		"batch_size": 6,
		"good_path": "GOOD_MICS/",
		"bad_path": "BAD_MICS/",
		"pretrained_weights": "",
		"saved_weights_name": "my_model.h5",
		"learning_rate": 1e-4,
		"nb_epoch": 100,
		"nb_early_stop": 15
	}
}

The fields in the section model have the following meaning:

input_size: This is the image size to which each micrograph is resized to.
mask_radius: (Optional) Circular mask radius which is applied after resizing to the input size. If not given, it uses 0.4 * input_size as default.

The fields in the section train have the following meaning:

batch_size: How many micrographs are in one mini-batch. If you have memory problems, you can try to reduce this value.
good_path: Path to folder with good micrographs.
bad_path: Path to folder with bad micrographs.
pretrained_weights: Path to weights that are used to initialize the network. It can be empty. As Cinderella is using the same network architecture as crYOLO, we are typically using the general network of crYOLO as pretrained weights.
saved_weights_name: Final model filename
learning_rate: Learning rate, should not be changed.
nb_epoch: Maximum number of epochs to train. However, it will stop earlier (see nb_early_stop).
nb_early_stop: If the validation loss did not improve “nb_early_stop” times in a row, the training will stop automatically.

The next step is to run the training:

sp_cinderella_train.py -c example_config.json --gpu 1

This will train a classification network on the GPU with ID=1. After the training finishes, you get a my_model.h5 file. This can then be used to classfiy micrographs into good / bad categories.

Classify

Suppose you want to separate good and bad micrographs in the folder micrographs and you want to save a list with the filenames of the good / bad micrographgs into the folder output_folder. Furthermore you want to use the model my_model.h5 and the GPU with ID=1. Micrographs with a confidence bigger than 0.5 should be classified as good micrograph.

This is the command to run:

sp_cinderella_predict.py -i micrographs/ -w model.h5 -o output_folder/ -t 0.5 --gpu 1

You will find the files bad.txt and good.txt in your output_folder.

Table of Contents

How to use SPHIRE's Cinderella for micrograph selection

Download & Install

Training

Classify