User Tools

Site Tools


auto2d_tutorial

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
auto2d_tutorial [2019/07/10 13:46]
twagner [Classify]
auto2d_tutorial [2020/08/28 07:36] (current)
twagner [Training]
Line 1: Line 1:
-====== How to use SPHIRE's Cinderella ======+====== How to use SPHIRE's Cinderella for 2D class selection ====== 
 +This tutorial describes how to use Cinderella to classify 2D class averages. You can either use a pretrained model (see section //Classify//) or train your own model (see section //Training//). 
 ==== Download & Install ==== ==== Download & Install ====
 You can find the download and installation instructions here: [[auto_2d_class_selection|Download and Installation]] You can find the download and installation instructions here: [[auto_2d_class_selection|Download and Installation]]
Line 6: Line 8:
  
 I suppose you downloaded the latest classification model. I suppose you downloaded the latest classification model.
- 
-This is the corresponding configuration file: 
- 
-<code json config.json> 
-{ 
- "model": { 
- "input_size": [75,75] 
- }, 
- 
- "train": { 
- "batch_size": 32, 
- "good_classes": "GOOD_CLASSES/", 
- "bad_classes": "BAD_CLASSES/", 
- "pretrained_weights": "", 
- "saved_weights_name": "my_model.h5", 
- "learning_rate": 1e-4, 
- "nb_epoch": 100, 
- "nb_early_stop": 5, 
-                "train_valid_thresh":   0.8, 
-                "max_valid_img_per_file": -1 
- } 
-} 
-</code> 
- 
-The fields have the following meaning:  
-  * **input_size**: Size to which the classes are internally downsampled.  
-  * **batch_size**: Number images that used in in one batch during training. 
-  * **good_classes**: Path to folder with good classes saved as stacks in .mrc or .hdf format 
-  * **bad_classes**: Path to folder with bad classes saved as stacks in .mrc or .hdf format 
-  * **pretrained_weights**: Path to a model that should be used to initialize the training. 
-  * **saved_weights_name**: Everytime the network improves in terms of validation loss, it will save the model into the file specified here. 
-  * **learning_rate**: Defines the step size during training. Default should be kept. 
-  * **nb_epoch**: Maximum number of epochs the network will train. It might not reach this number, as Cinderella stops training if it recognize that the validation loss is not improving anymore. 
-  * **nb_early_stop**: If the validation loss did not improve that number in a row, it will stop training. 
-  * **train_valid_thresh**: Fraction of images that are used for training from each stack file. The remaining images are used for validation. 
-  * **max_valid_img_per_file**: Maximum number of validation images per stack file that should be used. -1 means that it is not used. 
- 
-Copy this into a new file called ''config.json''. During classification, the options in the "train" section are ignored. 
  
 To run the classification I suppose you want to separate good and bad classes in classes_after_isac.hdf (or any other .mrcs / .hdf file with classes) and you want to save your new .hdf (.mrcs) files into the folder ''output_folder''. Furthermore you want to use the model ''model.h5'' and the GPU with ID=1. Classes with a confidence bigger than 0.7 should be classified as good class. To run the classification I suppose you want to separate good and bad classes in classes_after_isac.hdf (or any other .mrcs / .hdf file with classes) and you want to save your new .hdf (.mrcs) files into the folder ''output_folder''. Furthermore you want to use the model ''model.h5'' and the GPU with ID=1. Classes with a confidence bigger than 0.7 should be classified as good class.
Line 50: Line 14:
  
 <code> <code>
-sp_cinderella_predict.py -i path/to/classes_after_isac.hdf -w model.h5 -o output_folder/ -c config.json -t 0.7 --gpu 1+sp_cinderella_predict.py -i path/to/classes_after_isac.hdf -w model.h5 -o output_folder/ -t 0.7 --gpu 1
 </code> </code>
  
Line 57: Line 21:
 ==== Training ==== ==== Training ====
 If you would like to train Cinderella with your own classes, you can easily do it. If you would like to train Cinderella with your own classes, you can easily do it.
-First you have to separate your good and bad classes into separate files. Create two folders, on containing good classes (e.g ''GOOD_CLASSES/'') and one contain bad classes (e.g ''BAD_CLASSES/''). Both folders can contain multiple .hdf / .mrcs files.+First you have to separate your good and bad classes into separate files. Create two folders, one containing good classes (e.g ''GOOD_CLASSES/'') and one contain bad classes (e.g ''BAD_CLASSES/''). Both folders can contain multiple .hdf / .mrcs files.
  
 Then specify the paths into a config file like this: Then specify the paths into a config file like this:
Line 64: Line 28:
 { {
  "model": {  "model": {
- "input_size": [75,75]+ "input_size": [64,64]
  },  },
  
  "train": {  "train": {
  "batch_size": 32,  "batch_size": 32,
- "good_classes": "GOOD_CLASSES/", + "good_path": "GOOD_CLASSES/", 
- "bad_classes": "BAD_CLASSES/",+ "bad_path": "BAD_CLASSES/",
  "pretrained_weights": "",  "pretrained_weights": "",
  "saved_weights_name": "my_model.h5",  "saved_weights_name": "my_model.h5",
  "learning_rate": 1e-4,  "learning_rate": 1e-4,
  "nb_epoch": 100,  "nb_epoch": 100,
- "nb_early_stop": 5+ "nb_early_stop": 15
  }  }
 } }
 </code> </code>
-The fields have the following meaning:+The fields in the section **model** have the following meaning:
   * **input_size**: This is the image size to which each class is resized to.   * **input_size**: This is the image size to which each class is resized to.
 +  * **mask_radius**: (Optional) Circular mask radius which is applied after resizing to the input size. If not given, it uses 0.4*input_size as default.
 +
 +The fields in the section **train** have the following meaning:
   * **batch_size**: How many classes are in one mini-batch. If you have memory problems, you can try to reduce this value.   * **batch_size**: How many classes are in one mini-batch. If you have memory problems, you can try to reduce this value.
-  * **good_classes**: Path to folder with good classes. +  * **good_path**: Path to folder with good classes. 
-  * **bad_classes**: Path to folder with bad classes. +  * **bad_path**: Path to folder with bad classes. 
-  * **pretrained_weights**: Path to weights that are used to initialize the network. It can be empty. As Cinderella is using the same network architecture as crYOLO, we are typically using the general network of crYOLO as pretrained weights.+  * **pretrained_weights**: Path to weights that are used to initialize the network. It can be empty. As Cinderella is using the same network architecture as crYOLO, we are typically using the [[downloads:cryolo_1#general_phosaurusnet_models|general network of crYOLO]] as pretrained weights.
   * **saved_weights_name**: Final model filename   * **saved_weights_name**: Final model filename
   * **learning_rate**: Learning rate, should not be changed.   * **learning_rate**: Learning rate, should not be changed.
   * **nb_epoch**: Maximum number of epochs to train. However, it will stop earlier (see nb_early_stop).   * **nb_epoch**: Maximum number of epochs to train. However, it will stop earlier (see nb_early_stop).
-  * **nb_early_stop**: If the validation loss did not improve "nb_early_stop"in a row, the training will stop automatically.+  * **nb_early_stop**: If the validation loss did not improve "nb_early_stop" times in a row, the training will stop automatically.
  
 The next step is to run the training: The next step is to run the training:
auto2d_tutorial.1562759193.txt.gz · Last modified: 2019/07/10 13:46 by twagner