Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
pipeline:window:cryolo [2019/02/13 17:46]
twagner [Data preparation]
pipeline:window:cryolo [2019/04/21 22:31] (current)
twagner [Picking particles - Without training using a general model]
Line 7: Line 7:
   * crYOLO makes picking **smart** -- The network learns the context of particles (e.g. not to pick particles on carbon or within ice contamination )   * crYOLO makes picking **smart** -- The network learns the context of particles (e.g. not to pick particles on carbon or within ice contamination )
   * crYOLO makes training **easy** -- You might use a general network model and skip training completely. However, if the general model doesn'​t give you satisfactory results or if you would like to improve them, you might want to train a specialized model specific for your data set by selecting __particles__ (no selection of negative examples necessary) on a small number of micrographs.   * crYOLO makes training **easy** -- You might use a general network model and skip training completely. However, if the general model doesn'​t give you satisfactory results or if you would like to improve them, you might want to train a specialized model specific for your data set by selecting __particles__ (no selection of negative examples necessary) on a small number of micrographs.
 +  * crYOLO makes training **tolerant** -- Don't worry if you miss quite a lot particles during creation of your training set. [[:​cryolo_picking_unlabeled|crYOLO will still do the job.]]
  
 In this tutorial we explain our recommended configurations for single particle and filament projects. You can find more information about supported networks and about the config file in the following articles: In this tutorial we explain our recommended configurations for single particle and filament projects. You can find more information about supported networks and about the config file in the following articles:
   * [[:​cryolo_nets|crYOLO networks]]   * [[:​cryolo_nets|crYOLO networks]]
   * [[:​cryolo_config|crYOLO configuration file]]   * [[:​cryolo_config|crYOLO configuration file]]
 +
 +You can find more technical details in our paper:
 +
 +[[https://​www.biorxiv.org/​content/​10.1101/​356584v2|SPHIRE-crYOLO:​ A fast and accurate fully automated particle picker for cryo-EM
 +]]
 +
 +<​html>​
 +<a href="​https://​f1000.com/​prime/​733517098?​bd=1"​ target="​_blank"><​img src="​https://​s3.amazonaws.com/​cdn.f1000.com/​images/​badges/​badgef1000.gif"​ alt="​Access the recommendation on F1000Prime"​ id="​bg"​ /></​a>​
 +</​html>​
 ===== Installation ===== ===== Installation =====
  
 You can find the download and installation instructions here: [[howto:​download_latest_cryolo|Download and Installation]] You can find the download and installation instructions here: [[howto:​download_latest_cryolo|Download and Installation]]
  
-===== Picking - Using a model trained for your data =====+===== Picking ​particles ​- Using a model trained for your data =====
  
  
Line 34: Line 44:
 In the following I will assume that your image data is in the folder ''​full_data''​. In the following I will assume that your image data is in the folder ''​full_data''​.
  
-The next step is to create training data. To do so, we have to pick single particles manually in several micrographs. Ideally, the micrographs are picked to completion. One may ask how many micrographs have to be picked? It depends! Typically 10 micrographs are a good start. However, that number may increase / decrease due to several factors:+The next step is to create training data. To do so, we have to pick single particles manually in several micrographs. Ideally, the micrographs are picked to completion. ​[[:​cryolo_picking_unlabeled|However,​ it is not necessary to pick all particles. crYOLO will still converge if you miss some (or even many).]] 
 +One may ask how many micrographs have to be picked? It depends! Typically 10 micrographs are a good start. However, that number may increase / decrease due to several factors:
   * A very heterogenous background could make it necessary to pick more micrographs.   * A very heterogenous background could make it necessary to pick more micrographs.
   * If your micrograph is only sparsely decorated, you may need to pick more micrographs.   * If your micrograph is only sparsely decorated, you may need to pick more micrographs.
Line 59: Line 70:
 Create a new directory called ''​train_annotation''​ and save it there. Close boxmanager. Create a new directory called ''​train_annotation''​ and save it there. Close boxmanager.
  
-Now create a third folder with the name ''​train_image''​. Now for each box file, copy the corresponding image from ''​full_data''​ into ''​train_image''​. crYOLO will detect image / box file pairs by search taking the box file an searching for an image filename which contains the box filename.+Now create a third folder with the name ''​train_image''​. Now for each box file, copy the corresponding image from ''​full_data''​ into ''​train_image''​((While it is nice to keep the things organized, you don't have to copy your training images in a separate folder. In the configuration file (see below) you can also simply specify the full_data directory as "//​train_image_folder//"​. crYOLO will find the correct images using the box files.)). crYOLO will detect image / box file pairs by search taking the box file an searching for an image filename which contains the box filename.
  
 ==== Configuration ==== ==== Configuration ====
Line 84: Line 95:
         "​train_times": ​         10,         "​train_times": ​         10,
         "​pretrained_weights": ​  "​model.h5",​         "​pretrained_weights": ​  "​model.h5",​
-        "​batch_size": ​          6,+        "​batch_size": ​          4,
         "​learning_rate": ​       1e-4,         "​learning_rate": ​       1e-4,
         "​nb_epoch": ​            50,         "​nb_epoch": ​            50,
Line 108: Line 119:
 //​[[:​cryolo_config|Click here to get more information about the configuration file]]// //​[[:​cryolo_config|Click here to get more information about the configuration file]]//
  
-Please set the value in the //"​anchors"//​ field to your desired box size. It should be size of the minimum enclosing square in pixels. Furthermore check if the fields //"​train_image_folder"//​ and //"​train_annot_folder"//​ have the correct values. Typically, 20% of the training data are randomly chosen as validation data. If you want to use specific images as validation data, you can move the images and the corresponding box files to the folders specified in //"​valid_image_folder"//​ and //"​valid_annot_folder"//​. Make sure that they are removed from the original training folder! With the line below, crYOLO automatically filters your images to an absolute ​frequence ​0.1 and write them into a folder "​filtered"​.+Please set the value in the //"​anchors"//​ field to your desired box size. It should be size of the minimum enclosing square in pixels. Furthermore check if the fields //"​train_image_folder"//​ and //"​train_annot_folder"//​ have the correct values. Typically, 20% of the training data are randomly chosen as validation data. If you want to use specific images as validation data, you can move the images and the corresponding box files to the folders specified in //"​valid_image_folder"//​ and //"​valid_annot_folder"//​. Make sure that they are removed from the original training folder! With the line below, crYOLO automatically filters your images to an absolute ​frequency ​0.1 and write them into a folder "​filtered"​.
 <​code>​ <​code>​
 "​filter": ​              ​[0.1,"​filtered"​]. "​filter": ​              ​[0.1,"​filtered"​].
Line 142: Line 153:
 The final model will be called ''​model.h5''​ The final model will be called ''​model.h5''​
  
-The training stops when the "​loss"​ metric on the validation data does not improve ​times in a row. This is typically enough. ​However, you might want to give the training more time to find the best model. You might increase the "not changed in a row" parameter to, for example, ​10 by adding the flag //-e 10//:+The training stops when the "​loss"​ metric on the validation data does not improve ​10 times in a row. This is typically enough. ​In case want to give the training more time to find the best model. You might increase the "not changed in a row" parameter to, for example, ​15 by adding the flag //-e 15//:
 <​code>​ <​code>​
-cryolo_train.py -c config.json -w 0 -g 0 -e 10+cryolo_train.py -c config.json -w 0 -g 0 -e 15
 </​code>​ </​code>​
 to the training command. to the training command.
 ==== Picking ==== ==== Picking ====
-You can now use the model weights saved in ''​model.h5''​ to pick all your images in the directory ''​full_data''​. To do this, run: +You can now use the model weights saved in ''​model.h5'' ​(//if you come to this section from another point of the tutorial, this filename might be different like ''​gmodel_phosnet_X_Y.h5''//​) ​to pick all your images in the directory ''​full_data''​. To do this, run: 
 <​code>​ <​code>​
 cryolo_predict.py -c config.json -w model.h5 -i full_data/ -g 0 -o boxfiles/ cryolo_predict.py -c config.json -w model.h5 -i full_data/ -g 0 -o boxfiles/
 </​code>​ </​code>​
  
-You will find the picked particles in the directory ''​boxfiles''​+You will find the picked particles in the directory ''​boxfiles''​.
  
 If you want to pick less conservatively or more conservatively you might want to change the selection threshold from the default of 0.3 to a less conservative value like 0.2 or more conservative value like 0.4 using the //-t// parameter: If you want to pick less conservatively or more conservatively you might want to change the selection threshold from the default of 0.3 to a less conservative value like 0.2 or more conservative value like 0.4 using the //-t// parameter:
Line 159: Line 170:
 cryolo_predict.py -c config.json -w model.h5 -i full_data/ -g 0 -o boxfiles/ -t 0.2 cryolo_predict.py -c config.json -w model.h5 -i full_data/ -g 0 -o boxfiles/ -t 0.2
 </​code>​ </​code>​
 +However, it is much easier to select the best threshold after picking using the ''​CBOX''​ files written by crYOLO as described in the next section
  
 ==== Visualize the results ==== ==== Visualize the results ====
Line 166: Line 178:
 cryolo_boxmanager.py cryolo_boxmanager.py
 </​code>​ </​code>​
-Now press //File -> Open image// folder and the select the ''​full_data''​ directory. The first image should pop up. Then you import the box files with //File -> Import box files// and select in the ''​boxfiles''​ folder the ''​EMAN''​ directory.+Now press //File -> Open image// folder and the select the ''​full_data''​ directory. The first image should pop up. Then you import the box files with //File -> Import box files// and select in the ''​boxfiles''​ folder the ''​EMAN''​ directory. ​
  
 +Since version 1.3.0 crYOLO writes cbox files in a separate ''​CBOX''​ folder. You can import them into the box manager, change the threshold easily using the live preview and write the new box selection into new box files.
  
 +[{{ :​pipeline:​window:​ezgif-1-3b966b0324d1.gif?​400 |This example shows how to filter particle boxes using the cryolo boxmanager. It is an animated gif. Click on it to see it playing.}}]
  
-===== Picking - Without training using a general model =====+Right now, **this filtering does not yet work for filaments**.
  
-The general model can be found here: [[howto:​download_latest_cryolo|Download and Installation]]. ​+ 
 +===== Picking particles - Without training using a general model ===== 
 +Here you can find how to apply the general models we trained for you. If you would like to train your own general model, please see our extra wiki page: [[:​cryolo_train_general_model|How to train your own general model]] 
 + 
 +Our general models ​can be found and downloaded ​here: [[howto:​download_latest_cryolo|Download and Installation]]. ​
 ==== Configuration==== ==== Configuration====
 The next step is to create a configuration file. Type: The next step is to create a configuration file. Type:
Line 181: Line 199:
 Open the file with your preferred editor. Open the file with your preferred editor.
  
-For the general **[[:​cryolo_nets#​network_3_phosaurusnet|Phosaurus network]]** enter the following inside:+There are two general **[[:​cryolo_nets#​network_3_phosaurusnet|Phosaurus networks]]** available. One for cryo em images and one for negative stain data. 
 +=== CryoEM images === 
 +For the general **[[:​cryolo_nets#​network_3_phosaurusnet|Phosaurus network]]** trained for **cryo images** enter the following inside:
 <code json config.json>​ <code json config.json>​
     {     {
Line 191: Line 211:
         "​num_patches": ​         1,         "​num_patches": ​         1,
         "​filter": ​              ​[0.1,"​tmp_filtered"​]         "​filter": ​              ​[0.1,"​tmp_filtered"​]
 +      }
 +    }
 +</​code>​
 +Please set the value in the //"​anchors"//​ field to your desired box size. It should be size of the minimum particle enclosing square in pixel. ​
 +
 +=== Negative stain images ===
 +For the general model for **negative stain data** please use:
 +<code json config.json>​
 +    {
 +    "​model"​ : {
 +        "​architecture": ​        "​PhosaurusNet",​
 +        "​input_size": ​          1024,
 +        "​anchors": ​             [205,205],
 +        "​max_box_per_image": ​   700,
 +        "​num_patches": ​         1
       }       }
     }     }
Line 200: Line 235:
 Just follow the description given [[pipeline:​window:​cryolo#​Picking|above]] Just follow the description given [[pipeline:​window:​cryolo#​Picking|above]]
  
-As for a direct trained model, you might want to play around with the -t parameter ​to make picking less or more conservative.+As for a direct trained model, you might want to play around with the confidence threshold, either by using the ''​CBOX''​ files after prediction or use directly a different confidence threshold using the -t parameter ​during prediction.
  
 +
 +===== Picking particles - Using the general model refined for your data =====
 +
 +
 +Since crYOLO 1.3 you can train a model for your data by //​fine-tuning//​ the general model.
 +
 +What does //​fine-tuning//​ mean?
 +
 +The general model was trained on a lot of particles with a variety of shapes and therefore learned a very good set of generic features. The last layers, however, learn a pretty abstract representation of the particles and it might be that they do not perfectly fit for your particle at hand. Fine-tuning only traines the last two convolutional layers, but keep the others fixed. This adjusts the more abstract representation for your specific problem. ​
 +
 +Why should I //​fine-tune//​ my model instead of training from scratch?
 +  -  From theory, using fine-tuning should reduce the risk of overfitting ((Overfitting means, that the model works good on the training micrographs,​ but not on new unseen micrographs. The model just memorized what it saw instead of learning generic features.)). ​
 +  - The training is much faster, as not all layers have to be trained.
 +  - The training will need less GPU memory ((We are testing crYOLO with its default configuration on graphic cards with >= 8 GB memory. Using the fine tune mode, it should also work with GPUs with 4 GB memory)) and therefore is usable with NVIDIA cards with less memory. ​
 + 
 +However, the fine tune mode is still somewhat experimental and we will update this section if see more advantages or disadvantages.
 +
 +==== Configuration ====
 +
 +You can use almost the same configuration as used when  [[pipeline:​window:​cryolo#​configuration|training from scratch]]. You just have to tell crYOLO to use the latest general model((You can download it [[http://​sphire.mpg.de/​wiki/​doku.php?​id=downloads:​cryolo_1&​redirect=1#​general_phosaurusnet_models|here]])) by pointing to it with the //"​pretrained_weights"//​ options:
 +
 +<​code>​
 +"​train":​ {
 +    [...]
 +    "​pretrained_weights": ​             "​LATEST_GENERAL_MODEL.h5",​
 +    [...]
 +    "​saved_weights_name": ​  "​my_refined_model.h5",​
 +    [...]
 +}
 +</​code>​
 +
 +==== Training ====
 +In comparision to the training from scratch, you can skip the warm up training. Moreover you have to add the //​--fine_tune//​ flag:
 +
 +<​code>​
 +cryolo_train.py -c config.json -w 0 -g 0 --fine_tune
 +</​code>​
 +==== Picking ====
 +Picking is identical as with a model trained from scratch, so we will skip it here. Just follow the description given [[pipeline:​window:​cryolo#​Picking|above]]
 +
 +==== Training on CPU ====
 +
 +
 +The fine tune mode is especially useful if you want to [[downloads:​cryolo_1#​run_it_on_the_cpu|train crYOLO on the CPU]]. On my local machine it reduced the time for training cryolo on 14 micrographs from 12-15 hours to 4-5 hours.
 ===== Picking filaments - Using a model trained for your data ===== ===== Picking filaments - Using a model trained for your data =====
 Since version 1.1.0 crYOLO supports picking filaments. Since version 1.1.0 crYOLO supports picking filaments.
Line 217: Line 296:
  
 After this is done, you have to prepare training data for your model. After this is done, you have to prepare training data for your model.
- Right now, you have to use the sxhelixboxer.py to generate the training data:+ Right now, you have to use the e2helixboxer.py to generate the training data:
 <​code>​ <​code>​
-sxhelixboxer.py --gui my_images/​*.mrc+e2helixboxer.py --gui my_images/​*.mrc
 </​code>​ </​code>​
  
-After tracing your training data in sxhelixboxer, export them using //File -> Save//. Make sure that you export particle coordinates as this the only format supported right now (see screenshot). In the following example, it is expected that you exported into a folder called "​train_annotation"​.+After tracing your training data in e2helixboxer, export them using //File -> Save//. Make sure that you export particle coordinates as this the only format supported right now (see screenshot). In the following example, it is expected that you exported into a folder called "​train_annotation"​.
  
 ==== Configuration ==== ==== Configuration ====
-You can configure it the same way as for a "​normal"​ project. ​We recommend to use [[:​cryolo_nets#​network_2_yolo_with_patches|YOLO in "patch mode" with 3x3 patches]]:+You can configure it the same way as for a "​normal"​ project. 
 <code json config.json>​ <code json config.json>​
 { {
     "​model"​ : {     "​model"​ : {
-        "​architecture": ​        "​YOLO", +        "​architecture": ​        "​PhosaurusNet", 
-        "​input_size": ​          768+        "​input_size": ​          1024
-        "​anchors": ​             [200,200],+        "​anchors": ​             [160,160],
         "​max_box_per_image": ​   600,         "​max_box_per_image": ​   600,
-        "​num_patches": ​         ​3+        "​num_patches": ​         ​1
-        "​filter": ​              ​[0.1,"​tmp_filtered"]+        "​filter": ​              ​[0.1,"​filtered"]
     },     },
  
Line 242: Line 322:
         "​train_times": ​         10,         "​train_times": ​         10,
         "​pretrained_weights": ​  "​model.h5",​         "​pretrained_weights": ​  "​model.h5",​
-        "​batch_size": ​          6,+        "​batch_size": ​          4,
         "​learning_rate": ​       1e-4,         "​learning_rate": ​       1e-4,
         "​nb_epoch": ​            50,         "​nb_epoch": ​            50,
Line 264: Line 344:
 } }
 </​code>​ </​code>​
 +
 //​[[:​cryolo_config|Click here to get more information about the configuration file]]// //​[[:​cryolo_config|Click here to get more information about the configuration file]]//
  
Line 335: Line 416:
  
 ===== Advanced parameters ===== ===== Advanced parameters =====
-During **training** (//​cryolo_train//​),​ there is the following advanced ​parameter+During **training** (//​cryolo_train//​),​ there are the following advanced ​parameters
-  * //​--warm_restarts//:​ With this option the learning rate is decreasing after each epoch and then reset after a couple of epochs.+  * //%%--%%warm_restarts//:​ With this option the learning rate is decreasing after each epoch and then reset after a couple of epochs
 +  * //​%%--%%num_cpu NUMBER_OF_CPUS//:​ Number of CPU cores used during training 
 +  * //​%%--%%gpu_fraction FRACTION//: Number between 0 - 1 quantifying the fraction of GPU memory that is reserved by crYOLO 
 +  * //​%%--%%skip_augmentation//:​ Set this flaq, if crYOLO should skip the data augmentation (not recommended).
  
-During **picking** (//​cryolo_predict//​),​ there are two advanced parameters:​ +During **picking** (//​cryolo_predict//​),​ there are five advanced parameters:​ 
-  * //-t confidence_threshold//: With the -t parameter, you can let the crYOLO pick more conservative (e.g by adding -t 0.4 to the picking command) or less conservative (e.g by adding -t 0.2 to the picking command). The valid parameter range is 0 to 1. +  * //-t CONFIDENCE_THRESHOLD//: With the -t parameter, you can let the crYOLO pick more conservative (e.g by adding -t 0.4 to the picking command) or less conservative (e.g by adding -t 0.2 to the picking command). The valid parameter range is 0 to 1. 
-  * //-d distance_in_pixel//: With the -d parameter you can filter your picked particles. Boxes with a distance (pixel) less than this value will be removed. +  * //-d DISTANCE_IN_PIXEL//: With the -d parameter you can filter your picked particles. Boxes with a distance (pixel) less than this value will be removed. 
-  * //​-pbs ​prediction_batch_size//: With the -pbs parameter you can set the number of images picked as batch. Default is 3.+  * //​-pbs ​PREDICTION_BATCH_SIZE//: With the -pbs parameter you can set the number of images picked as batch. Default is 3. 
 +  * //​%%--%%otf//:​ Instead of saving the filtered images into an seperate directory, crYOLO will filter them on-the-fly and don't write them to disk. 
 +  * //​%%--%%num_cpu NUMBER_OF_CPUS//:​ Number of CPU cores used during prediction 
 +  * //​%%--%%gpu_fraction FRACTION//: Number between 0 -1 quantifying the fraction of GPU memory that is reserved by crYOLO
  
 ===== Help ===== ===== Help =====