This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
pipeline:window:cryolo [2019/07/10 09:53] twagner [Data preparation] |
pipeline:window:cryolo [2021/02/19 10:00] (current) twagner |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | {{ : | + | {{ : |
===== Overview ===== | ===== Overview ===== | ||
- | CrYOLO is a fast and accurate particle picking procedure. It's based on convolutional neural networks and utilizes the popular [[https:// | + | <note warning> |
- | * crYOLO makes picking **fast** -- On a modern GPU it will pick your particles at up to 6 micrographs per second. | + | |
- | * crYOLO makes picking **smart** -- The network learns the context of particles (e.g. not to pick particles on carbon or within ice contamination ) | + | |
- | * crYOLO makes training **easy** -- You might use a general network model and skip training completely. However, if the general model doesn' | + | |
- | * crYOLO makes training **tolerant** -- Don't worry if you miss quite a lot particles during creation of your training set. [[: | + | |
- | In this tutorial we explain our recommended configurations for single particle and filament projects. You can find more information about supported networks and about the config file in the following articles: | + | **NEW DOCUMENTATION** |
- | | + | |
- | | + | |
- | < | + | The documentation has moved to [[https://cryolo.readthedocs.io|https://cryolo.readthedocs.io]] |
- | You can find more technical details in our paper: | + | |
- | + | ||
- | [[https://doi.org/10.1038/s42003-019-0437-z|Wagner, | + | |
- | + | ||
- | We are also proud that crYOLO was recommended by F1000: | + | |
- | < | ||
- | <a href=" | ||
- | </ | ||
</ | </ | ||
- | ===== Installation ===== | ||
- | You can find the download and installation instructions here: [[howto:download_latest_cryolo|Download and Installation]] | + | CrYOLO is a fast and accurate particle picking procedure. It's based on convolutional neural networks and utilizes |
- | ===== Picking | + | * crYOLO makes picking **fast** |
+ | * crYOLO makes picking **smart** | ||
+ | * crYOLO makes training **easy** | ||
+ | * crYOLO makes training **tolerant** | ||
+ | In this tutorial we explain our recommended configurations for single particle and filament projects. You can find more information how to use crYOLO, about supported networks and about the config file in the following articles: | ||
- | ==== Data preparation ==== | + | * [[https:// |
- | CrYOLO supports MRC, TIF and JPG files. It can work with 32 bit data, 8 bit data and 16 bit data. | + | |
- | It will work on original MRC files, but it will probably improve when the data are denoised. Therefore you should low-pass filter them to a reasonable level. Since Version 1.2 crYOLO | + | * [[: |
- | < | + | |
- | " | + | |
- | </ | + | |
- | to the model section in your config file to filter your images down to an absolute frequency of 0.1. The filtered images are saved in folder '' | + | < |
- | <hidden **Alternativ: Using neural-network denoising with JANNI**> | + | You can find more technical details in our paper: |
- | < | + | |
- | crYOLO will automatically check if an image in full_data is available in the '' | + | |
- | <note tip> | + | [[https:// |
- | **Alternativ: Using neural-network denoising with JANNI** | + | |
- | Since crYOLO 1.4 you can also use neural network denoising with [[: | + | ---- |
- | To use JANNI' | + | We are also proud that crYOLO was recommended by F1000: |
- | <code> | + | //" |
- | " | + | |
- | </code> | + | |
- | I recommend to use denoising with JANNI only together with a GPU as it is rather slow (~ 1-2 seconds per micrograph on the GPU and 10 seconds per micrograph on the CPU) | + | ===== Installation ===== |
- | < | + | You can find the download and installation instructions here: [[: |
- | </ | + | |
- | < | + | |
- | If you followed the installation instructions, | + | {{page> |
- | < | + | ===== Release notes ===== |
- | source activate cryolo | + | |
- | </ | + | |
- | In the following I will assume that your image data is in the folder '' | + | {{page> |
- | The next step is to create training data. To do so, we have to pick single particles manually in several micrographs. Ideally, the micrographs are picked to completion. [[: | + | ===== Tutorials ===== |
- | One may ask how many micrographs have to be picked? It depends! Typically 10 micrographs are a good start. However, that number may increase / decrease due to several factors: | + | |
- | * A very heterogenous background could make it necessary to pick more micrographs. | + | |
- | * If your micrograph is only sparsely decorated, you may need to pick more micrographs. | + | |
- | We recommend that you start with 10 micrographs, | + | |
+ | Depending what you want to do, you can follow one of these self-contained Tutorials: | ||
- | {{: | + | - [[: |
- | To create your training | + | - [[: |
+ | - [[: | ||
+ | - [[: | ||
- | Start the box manager with the following command: | + | The **first, second |
- | < | + | |
- | cryolo_boxmanager.py | + | |
- | </ | + | |
- | + | ||
- | Now press //File -> Open image folder// and the select the '' | + | |
- | + | ||
- | | + | |
- | | + | |
- | * CONTROL + LEFT MOUSE BUTTON: Remove a box | + | |
- | + | ||
- | You can change the box size in the main window, by changing the number in the text field labeled //Box size://. Press //Set// to apply it to all picked particles. For picking, you should the use minimum sized square which encloses your particle. | + | |
- | + | ||
- | If you finished picking from your micrographs, | + | |
- | Create a new directory called '' | + | |
- | + | ||
- | Now create a third folder with the name '' | + | |
- | + | ||
- | ==== Configuration ==== | + | |
- | You now have to create a config file your picking project. To do this type: | + | |
- | < | + | |
- | touch config.json | + | |
- | </ | + | |
- | + | ||
- | To use the [[: | + | |
- | <code json config.json> | + | |
- | { | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | }, | + | |
- | + | ||
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | + | ||
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | }, | + | |
- | + | ||
- | " | + | |
- | " | + | |
- | " | + | |
- | + | ||
- | " | + | |
- | } | + | |
- | } | + | |
- | </ | + | |
- | // | + | |
- | + | ||
- | Please set the value in the //" | + | |
- | < | + | |
- | " | + | |
- | </ | + | |
- | crYOLO will automatically check if an image in full_data is available in the '' | + | |
- | + | ||
- | <note tip> | + | |
- | **Alternativ: Using neural-network denoising with JANNI** | + | |
- | + | ||
- | Since crYOLO 1.4 you can also use neural network denoising with [[: | + | |
- | + | ||
- | To use JANNI' | + | |
- | + | ||
- | < | + | |
- | " | + | |
- | </ | + | |
- | + | ||
- | I recommend to use denoising with JANNI only together with a GPU as it is rather slow (~ 1-2 seconds per micrograph on the GPU and 10 seconds per micrograph on the CPU) | + | |
- | + | ||
- | </ | + | |
- | + | ||
- | Please note the wiki entry about the [[: | + | |
- | + | ||
- | + | ||
- | + | ||
- | ==== Training ==== | + | |
- | + | ||
- | Now you are ready to train the model. In case you have multiple GPUs, you should first select a free GPU. The following command will show the status of all GPUs: | + | |
- | < | + | |
- | nvidia-smi | + | |
- | </ | + | |
- | For this tutorial, we assume that you have either a single GPU or want to use GPU 0. Therefore we add '-g 0' after each command below. However, if you have multiple (e.g GPU 0 and GPU 1) you could also use both by adding '-g 0 1' after each command. | + | |
- | + | ||
- | Navigate to the folder with '' | + | |
- | + | ||
- | **1. Warm up your network** | + | |
- | + | ||
- | < | + | |
- | cryolo_train.py -c config.json -w 3 -g 0 | + | |
- | </ | + | |
- | + | ||
- | **2. Train your network** | + | |
- | + | ||
- | < | + | |
- | cryolo_train.py -c config.json -w 0 -g 0 | + | |
- | </ | + | |
- | + | ||
- | The final model will be called '' | + | |
- | + | ||
- | The training stops when the " | + | |
- | < | + | |
- | cryolo_train.py -c config.json -w 0 -g 0 -e 15 | + | |
- | </ | + | |
- | to the training command. | + | |
- | ==== Picking ==== | + | |
- | You can now use the model weights saved in '' | + | |
- | < | + | |
- | cryolo_predict.py -c config.json -w model.h5 -i full_data/ -g 0 -o boxfiles/ | + | |
- | </ | + | |
- | + | ||
- | You will find the picked particles in the directory '' | + | |
- | + | ||
- | If you want to pick less conservatively or more conservatively you might want to change the selection threshold from the default of 0.3 to a less conservative value like 0.2 or more conservative value like 0.4 using the //-t// parameter: | + | |
- | < | + | |
- | cryolo_predict.py -c config.json -w model.h5 -i full_data/ -g 0 -o boxfiles/ -t 0.2 | + | |
- | </ | + | |
- | However, it is much easier to select the best threshold after picking using the '' | + | |
- | + | ||
- | ==== Visualize the results ==== | + | |
- | + | ||
- | To visualize your results you can use the box manager: | + | |
- | < | + | |
- | cryolo_boxmanager.py | + | |
- | </ | + | |
- | Now press //File -> Open image// folder and the select the '' | + | |
- | + | ||
- | Since version 1.3.0 crYOLO writes cbox files in a separate '' | + | |
- | + | ||
- | [{{ : | + | |
- | + | ||
- | <note warning> | + | |
- | Right now, **this filtering does not yet work for filaments**. | + | |
- | </ | + | |
- | + | ||
- | + | ||
- | + | ||
- | ===== Picking particles - Without training using a general model ===== | + | |
- | Here you can find how to apply the general models we trained for you. If you would like to train your own general model, please see our extra wiki page: [[: | + | |
- | + | ||
- | Our general models can be found and downloaded here: [[howto: | + | |
- | ==== Configuration==== | + | |
- | The next step is to create a configuration file. Type: | + | |
- | < | + | |
- | touch config.json | + | |
- | </ | + | |
- | + | ||
- | Open the file with your preferred editor. | + | |
- | + | ||
- | There are two general **[[: | + | |
- | === CryoEM images === | + | |
- | For the general **[[: | + | |
- | <hidden **config.json for low-pass filtered cryo-images**> | + | |
- | <code json config.json> | + | |
- | { | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | } | + | |
- | } | + | |
- | </ | + | |
- | </ | + | |
- | < | + | |
- | For the general model trained with **neural-network denoised cryo images** (with JANNI' | + | |
- | <hidden **config.json for neural-network denoised cryo-images**> | + | |
- | <code json config.json> | + | |
- | { | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | } | + | |
- | } | + | |
- | </ | + | |
- | </ | + | |
- | < | + | |
- | In all cases please set the value in the //" | + | |
- | + | ||
- | === Negative stain images === | + | |
- | For the general model for **negative stain data** please use: | + | |
- | <code json config.json> | + | |
- | { | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | } | + | |
- | } | + | |
- | </ | + | |
- | + | ||
- | Please set the value in the //" | + | |
- | + | ||
- | ==== Picking ==== | + | |
- | Just follow the description given [[pipeline: | + | |
- | + | ||
- | As for a direct trained model, you might want to play around with the confidence threshold, either by using the '' | + | |
- | + | ||
- | + | ||
- | ===== Picking particles - Using the general model refined for your data ===== | + | |
- | + | ||
- | + | ||
- | Since crYOLO 1.3 you can train a model for your data by // | + | |
- | + | ||
- | What does // | + | |
- | + | ||
- | The general model was trained on a lot of particles with a variety of shapes and therefore learned a very good set of generic features. The last layers, however, learn a pretty abstract representation of the particles and it might be that they do not perfectly fit for your particle at hand. Fine-tuning only traines the last two convolutional layers, but keep the others fixed. This adjusts the more abstract representation for your specific problem. | + | |
- | + | ||
- | Why should I // | + | |
- | - From theory, using fine-tuning should reduce the risk of overfitting ((Overfitting means, that the model works good on the training micrographs, | + | |
- | - The training is much faster, as not all layers have to be trained. | + | |
- | - The training will need less GPU memory ((We are testing crYOLO with its default configuration on graphic cards with >= 8 GB memory. Using the fine tune mode, it should also work with GPUs with 4 GB memory)) and therefore is usable with NVIDIA cards with less memory. | + | |
- | + | ||
- | However, the fine tune mode is still somewhat | + | |
- | + | ||
- | ==== Configuration ==== | + | |
- | + | ||
- | You can use almost the same configuration as used when [[pipeline: | + | |
- | + | ||
- | < | + | |
- | " | + | |
- | [...] | + | |
- | " | + | |
- | [...] | + | |
- | " | + | |
- | [...] | + | |
- | } | + | |
- | </ | + | |
- | + | ||
- | ==== Training ==== | + | |
- | In comparision to the training from scratch, | + | |
- | + | ||
- | < | + | |
- | cryolo_train.py -c config.json -w 0 -g 0 --fine_tune | + | |
- | </ | + | |
- | ==== Picking ==== | + | |
- | Picking is identical as with a model trained from scratch, so we will skip it here. Just follow the description given [[pipeline: | + | |
- | + | ||
- | ==== Training on CPU ==== | + | |
- | + | ||
- | + | ||
- | The fine tune mode is especially useful if you want to [[downloads: | + | |
- | ===== Picking filaments - Using a model trained for your data ===== | + | |
- | Since version 1.1.0 crYOLO supports picking filaments. | + | |
- | + | ||
- | Filament mode on Actin: | + | |
- | + | ||
- | {{: | + | |
- | + | ||
- | Filament mode on MAVS (EMPIAR-10031) : | + | |
- | + | ||
- | {{: | + | |
- | + | ||
- | ==== Data preparation ==== | + | |
- | {{ : | + | |
- | + | ||
- | After this is done, you have to prepare training data for your model. | + | |
- | Right now, you have to use the e2helixboxer.py to generate the training data: | + | |
- | < | + | |
- | e2helixboxer.py --gui my_images/ | + | |
- | </ | + | |
- | + | ||
- | After tracing your training data in e2helixboxer, | + | |
- | + | ||
- | ==== Configuration ==== | + | |
- | You can configure it the same way as for a " | + | |
- | + | ||
- | <code json config.json> | + | |
- | { | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | }, | + | |
- | + | ||
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | + | ||
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | " | + | |
- | }, | + | |
- | + | ||
- | " | + | |
- | " | + | |
- | " | + | |
- | + | ||
- | " | + | |
- | } | + | |
- | } | + | |
- | </ | + | |
- | + | ||
- | // | + | |
- | + | ||
- | Just adapt the anchors accordingly to your box size. | + | |
- | + | ||
- | ==== Training ==== | + | |
- | + | ||
- | In principle, there is not much difference in training crYOLO for filament picking and particle picking. For project with roughly 20 filaments per image we successfully trained on 40 images (=> 800 filaments). However, in our experience the warm-up phase and training need a little bit more time: | + | |
- | + | ||
- | **1. Warm up your network** | + | |
- | + | ||
- | < | + | |
- | cryolo_train.py -c config.json -w 10 -g 0 | + | |
- | </ | + | |
- | + | ||
- | **2. Train your network** | + | |
- | + | ||
- | < | + | |
- | cryolo_train.py -c config.json -w 0 -g 0 -e 10 | + | |
- | </ | + | |
- | + | ||
- | The final model will be called '' | + | |
- | ==== Picking ==== | + | |
- | + | ||
- | The biggest difference in picking filaments with crYOLO is during prediction. However, there are just three additional parameters needed: | + | |
- | + | ||
- | * //- -filament//: | + | |
- | * //-fw//: Filament width (pixels) | + | |
- | * //-bd//: Inter-Box distance (pixels). | + | |
- | + | ||
- | Let's assume you want to pick a filament with a width of 100 pixels (-fw 100). The box size is 200x200 | + | |
- | < | + | |
- | cryolo_predict.py -c config.json -w model.h5 -i full_data --filament -fw 100 -bd 20 -o boxes/ -g 0 -mn 6 | + | |
- | </ | + | |
- | + | ||
- | The directory '' | + | |
- | + | ||
- | ==== Visualize the results ==== | + | |
- | You can use the boxmanager as described [[pipeline: | + | |
- | + | ||
- | ===== Evaluate your results ===== | + | |
- | <note warning> | + | |
- | Unfortunately, | + | |
- | </ | + | |
- | The evaluation tool allows you, based on your validation data, to get statistics about your training. | + | |
- | If you followed the tutorial, the validation | + | |
- | < | + | |
- | cryolo_evaluation.py -c config.json -w model.h5 -r runfiles/ | + | |
- | </ | + | |
- | + | ||
- | The result looks like this: | + | |
- | {{: | + | |
- | + | ||
- | The table contains several statistics: | + | |
- | * AUC: Area under curve of the precision-recall curve. Overall summary statistics. Perfect classifier = 1, Worst classifier = 0 | + | |
- | * Topt: Optimal confidence threshold with respect to the F1 score. It might not be ideal for your picking, as the F1 score weighs recall and precision equally. However in SPA, recall is often more important than the precision. | + | |
- | * R (Topt): Recall using the optimal confidence threshold. | + | |
- | * R (0.3): Recall using a confidence threshold of 0.3. | + | |
- | * R (0.2): Recall using a confidence threshold of 0.2. | + | |
- | * P (Topt): Precision using the optimal confidence threshold. | + | |
- | * P (0.3): Precision using a confidence threshold of 0.3. | + | |
- | * P (0.2): Precision using a confidence threshold of 0.2. | + | |
- | * F1 (Topt): Harmonic mean of precision and recall using the optimal confidence threshold. | + | |
- | * F1 (0.3): Harmonic mean of precision and recall using a confidence threshold of 0.3. | + | |
- | * F1 (0.2): Harmonic mean of precision and recall using a confidence threshold of 0.2. | + | |
- | * IOU (Topt): Intersection over union of the auto-picked particles and the corresponding ground-truth boxes. The higher, the better -- evaluated with the optimal confidence threshold. | + | |
- | * IOU (0.3): Intersection over union of the auto-picked particles and the corresponding ground-truth boxes. The higher, the better -- evaluated with a confidence threshold of 0.3. | + | |
- | * IOU (0.2): Intersection over union of the auto-picked particles and the corresponding ground-truth boxes. The higher, the better -- evaluated with a confidence threshold of 0.2. | + | |
- | + | ||
- | If the training data consists of multiple folders, then evaluation will be done for each folder separately. | + | |
- | Furthermore, | + | |
- | + | ||
- | + | ||
- | ===== Advanced parameters ===== | + | |
- | During **training** (// | + | |
- | * // | + | |
- | * // | + | |
- | * // | + | |
- | * // | + | |
- | + | ||
- | During **picking** (// | + | |
- | * //-t CONFIDENCE_THRESHOLD//: | + | |
- | * //-d DISTANCE_IN_PIXEL//: | + | |
- | * //-pbs PREDICTION_BATCH_SIZE//: | + | |
- | * // | + | |
- | * // | + | |
- | * // | + | |
- | * //-sr SEARCH_RANGE_FACTOR//: | + | |
===== Help ===== | ===== Help ===== | ||
Line 500: | Line 62: | ||
Find help at our [[https:// | Find help at our [[https:// | ||
+ | |||
+ |