This is an old revision of the document!
CrYOLO is a fast and accurate particle picking procedure. It's based on convolutional neural networks and utilizes the popular You Only Look Once (YOLO) object detection system.
In this tutorial we explain our recommended configurations for single particle and filament projects. You can find more information how to use crYOLO, about supported networks and about the config file in the following articles:
We are also proud that crYOLO was recommended by F1000:
“CrYOLO works amazingly well in identifying the true particles and distinguishing them from other high-contrast features. Thus, crYOLO provides a fast, automated tool, which gives similar reliable results as careful manual selection and outperforms template based selection procedures.”
You can find the download and installation instructions here: Download and Installation
Depending what you want to do, you can follow one of these self-contained Tutorials:
The first and the second tutorial are the most common use cases and well tested. The third tutorial is still experimental but might give you better results in less time and less training data.
In the following I will assume that your image data is in the folder full_data
.
The next step is to create training data. To do so, we have to pick single particles manually in several micrographs. Ideally, the micrographs are picked to completion. However, it is not necessary to pick all particles. crYOLO will still converge if you miss some (or even many).
It depends! Typically 10 micrographs are a good start. However, that number may increase / decrease due to several factors:
We recommend that you start with 10 micrographs, then autopick your data, check the results and finally decide whether to add more micrographs to your training set. If you refine a general model, even 5 micrographs might be enough.
To create your training data, crYOLO is shipped with a tool called “boxmanager”. However, you can also use tools like e2boxer to create your training data.
Start the box manager with the following command:
cryolo_boxmanager.py
Now press File → Open image folder and the select the full_data
directory. The first image should pop up. You can navigate in the directory tree through the images. Here is how to pick particles:
You might want to run a low pass filter before you start picking the particles. Just press the [Apply] button to get a low pass filtered version of your currently selected micrograph. An absolute frequency cut-off of 0.1. The allowed values are 0 - 0.5. Lower values means stronger filtering.
You can change the box size in the main window, by changing the number in the text field labeled Box size:. Press [Set] to apply it to all picked particles. For picking, you should the use minimum sized square which encloses your particle.
If you finished picking from your micrographs, you can export your box files with Files → Write box files.
Create a new directory called train_annotation
and save it there. Close boxmanager.
Now create a third folder with the name train_image
. Now for each box file, copy the corresponding image from full_data
into train_image
1). crYOLO will detect image / box file pairs by taking the box file and searching for an image filename which contains the box filename.
You can use crYOLO either by command line or by using the GUI. The GUI should be easier for most users. You can start it with:
cryolo_gui.py
The crYOLO GUI is essentially a visualization of the command line interface. On left side, you find all possible “Actions”:
Each action has several parameters which are organized in tabs. Once you have chosen your settings you can press [Start] (just as example, don't press it now ), the command will be applied and crYOLO shows you the output:
It will tell you if something went wrong. Moreover, it will tell you all parameters used. Pressing [Back] brings you back to your settings, where you can either edit the settings (in case something went wrong) or go to the next action.
You now have to create a configuration file for your picking project. It contains all important constants and paths and helps you to reproduce your results later on.
You can either use the command line to create the configuration file or the GUI. For most users, the GUI should be easier. Select the config action and fill in the general fields:
At this point you could already press the [Start] button to generate the config file but you might want to take these options into account:
Since crYOLO 1.4 you can also use neural network denoising with JANNI. The easiest way is to use the JANNI's general model (Download here) but you can also train JANNI for your data. crYOLO directly uses an interface to JANNI to filter your data, you just have to change the filter argument in the Denoising tab from LOWPASS to JANNI and specify the path to your JANNI model: I recommend to use denoising with JANNI only together with a GPU as it is rather slow (~ 1-2 seconds per micrograph on the GPU and 10 seconds per micrograph on the CPU)
You can also modify all options and parameters directly in the config.json file. It can be opened by any text editor. Please note the wiki entry about the crYOLO configuration file if you want to know more details.
Now you are ready to train the model. In case you have multiple GPUs, you should first select a free GPU. The following command will show the status of all GPUs:
nvidia-smi
For this tutorial, we assume that you have either a single GPU or want to use GPU 0. In the GUI you have to fille the mandatory fields:
The default number of warmup epochs is fine as long you don't want to refine an existing model. During the warmup training epochs it will not try to estimate the size of your particle, which helps crYOLO to converge.
Before you start the training, you might want to change the GPU. By default, crYOLO will use GPU 0. The following command will show the status of all GPUs:
nvidia-smi
In “Optional arguments” tab you can change the GPU that should be used by crYOLO. If you have multiple (e.g GPU 0 and GPU 1) you could also use both by set the GPU argument to '0 1'.
If you start the training, it will stop when the “loss” metric on the validation data does not improve 10 times in a row. This is typically enough. In case want to give the training more time to find the best model you might want to increase the “not changed in a row” parameter to, for example, 15 by setting the early argument in the “Optional arguments” to 15.
Now press the Start button to start the training. The final model will be written to disk as specified in saved_weights_name in your configuration file.
Train crYOLO using the command line
Train crYOLO using the command line
Navigate to the folder with config_cryolo.json
file, train_image
folder, etc.
Train your network with 5 warmup epochs in GPU 0:
cryolo_train.py -c config.json -w 5 -g 0
The final model file will be written to disk.
cryolo_train.py -c config.json -w 3 -g 0 -e 15
to the training command.
Select the action “predict” and fill all arguments in the “Required arguments” tab:
In crYOLO, all particles have an assigned confidence value. By default, all particles with a confidence value below 0.3 are discarded. If you want to pick less or more conservatively you might want to change this confidence threshold to a less (e.g. 0.2) or more (e.g. 0.4) conservative value in the “Optional arguments” tab.
However, it is much easier to select the best threshold after picking using the CBOX
files written by crYOLO as described in the next section.
Monitor mode
When this option is activated, crYOLO will monitor your input folder. This especially useful for automation purposes. You can stop the monitor mode by writing an empty file with the name “stop.cryolo” in the input directory. Just add –monitor in the command line or check the monitor box in in the “Optional arguments” tab.
After picking is done, you can find four folders in your specified output folder:
Click to display ⇲
Click to hide ⇱
To pick all your images in the directory full_data
with the model weight file cryolo_model.h5
(e.g. or gmodel_phosnet_X_Y.h5
when using the general model) and and a confidence threshold of 0.3 run::
cryolo_predict.py -c config.json -w cryolo_model.h5 -i full_data/ -g 0 -o boxfiles/ -t 0.3
You will find the picked particles in the directory boxfiles
.
To visualize your results you can use the boxmanager:
As image_dir you select the full_data
directory. As box_dir you select the CBOX
folder (or EMAN_HELIX_SEGMENTED
in case of filaments).
CBOX files contain besides the particle coordinates more information like the confidence and the estimated size of each particle. When importing .cbox files into the box manager, it enables more filtering options in the GUI. You can plot size- and confidence distributions. Moreover, you can change the confidence threshold, minimum and maximum size and see the results in a live preview. If you are done with the filtering, you can then write the new box selection into new box files. The video below shows an example.
Here you can find how to apply the general models we trained for you. If you would like to train your own general model, please see our extra wiki page: How to train your own general model
Our general models can be found and downloaded here: Download and Installation.
You can use crYOLO either by command line or by using the GUI. The GUI should be easier for most users. You can start it with:
cryolo_gui.py
The crYOLO GUI is essentially a visualization of the command line interface. On left side, you find all possible “Actions”:
Each action has several parameters which are organized in tabs. Once you have chosen your settings you can press [Start] (just as example, don't press it now ), the command will be applied and crYOLO shows you the output:
It will tell you if something went wrong. Moreover, it will tell you all parameters used. Pressing [Back] brings you back to your settings, where you can either edit the settings (in case something went wrong) or go to the next action.
The next step is to create a configuration file. Type:
touch config.json
Open the file with your preferred editor.
There are two general Phosaurus networks available. One for cryo em images and one for negative stain data.
For the general Phosaurus network trained for low-pass filtered cryo images enter the following inside:
config.json for low-pass filtered cryo-images
config.json for low-pass filtered cryo-images
{ "model" : { "architecture": "PhosaurusNet", "input_size": 1024, "anchors": [205,205], "max_box_per_image": 700, "num_patches": 1, "filter": [0.1,"tmp_filtered"] } }
For the general model trained with neural-network denoised cryo images (with JANNI's general model) enter the following inside:
config.json for neural-network denoised cryo-images
config.json for neural-network denoised cryo-images
{ "model" : { "architecture": "PhosaurusNet", "input_size": 1024, "anchors": [205,205], "max_box_per_image": 700, "num_patches": 1, "filter": ["gmodel_janni_20190703.h5",24,3,"tmp_filtered_nn"] } }
You can download the file gmodel_janni_20190703.h5
here
In all cases please set the value in the “anchors” field to your desired box size. It should be size of the minimum particle enclosing square in pixel.
For the general model for negative stain data please use:
config.json for negative stain images
config.json for negative stain images
{ "model" : { "architecture": "PhosaurusNet", "input_size": 1024, "anchors": [205,205], "max_box_per_image": 700, "num_patches": 1 } }
Please set the value in the “anchors” field to your desired box size. It should be size of the minimum particle enclosing square in pixel.
Select the action “predict” and fill all arguments in the “Required arguments” tab:
In crYOLO, all particles have an assigned confidence value. By default, all particles with a confidence value below 0.3 are discarded. If you want to pick less or more conservatively you might want to change this confidence threshold to a less (e.g. 0.2) or more (e.g. 0.4) conservative value in the “Optional arguments” tab.
However, it is much easier to select the best threshold after picking using the CBOX
files written by crYOLO as described in the next section.
Monitor mode
When this option is activated, crYOLO will monitor your input folder. This especially useful for automation purposes. You can stop the monitor mode by writing an empty file with the name “stop.cryolo” in the input directory. Just add –monitor in the command line or check the monitor box in in the “Optional arguments” tab.
After picking is done, you can find four folders in your specified output folder:
Click to display ⇲
Click to hide ⇱
To pick all your images in the directory full_data
with the model weight file cryolo_model.h5
(e.g. or gmodel_phosnet_X_Y.h5
when using the general model) and and a confidence threshold of 0.3 run::
cryolo_predict.py -c config.json -w cryolo_model.h5 -i full_data/ -g 0 -o boxfiles/ -t 0.3
You will find the picked particles in the directory boxfiles
.
To visualize your results you can use the boxmanager:
As image_dir you select the full_data
directory. As box_dir you select the CBOX
folder (or EMAN_HELIX_SEGMENTED
in case of filaments).
CBOX files contain besides the particle coordinates more information like the confidence and the estimated size of each particle. When importing .cbox files into the box manager, it enables more filtering options in the GUI. You can plot size- and confidence distributions. Moreover, you can change the confidence threshold, minimum and maximum size and see the results in a live preview. If you are done with the filtering, you can then write the new box selection into new box files. The video below shows an example.
Since crYOLO 1.3 you can train a model for your data by fine-tuning the general model.
What does fine-tuning mean?
The general model was trained on a lot of particles with a variety of shapes and therefore learned a very good set of generic features. The last layers, however, learn a pretty abstract representation of the particles and it might be that they do not perfectly fit for your particle at hand. Fine-tuning only traines the last two convolutional layers, but keep the others fixed. This adjusts the more abstract representation for your specific problem.
Why should I fine-tune my model instead of training from scratch?
However, the fine tune mode is still somewhat experimental and we will update this section if see more advantages or disadvantages.
In the following I will assume that your image data is in the folder full_data
.
The next step is to create training data. To do so, we have to pick single particles manually in several micrographs. Ideally, the micrographs are picked to completion. However, it is not necessary to pick all particles. crYOLO will still converge if you miss some (or even many).
It depends! Typically 10 micrographs are a good start. However, that number may increase / decrease due to several factors:
We recommend that you start with 10 micrographs, then autopick your data, check the results and finally decide whether to add more micrographs to your training set. If you refine a general model, even 5 micrographs might be enough.
To create your training data, crYOLO is shipped with a tool called “boxmanager”. However, you can also use tools like e2boxer to create your training data.
Start the box manager with the following command:
cryolo_boxmanager.py
Now press File → Open image folder and the select the full_data
directory. The first image should pop up. You can navigate in the directory tree through the images. Here is how to pick particles:
You might want to run a low pass filter before you start picking the particles. Just press the [Apply] button to get a low pass filtered version of your currently selected micrograph. An absolute frequency cut-off of 0.1. The allowed values are 0 - 0.5. Lower values means stronger filtering.
You can change the box size in the main window, by changing the number in the text field labeled Box size:. Press [Set] to apply it to all picked particles. For picking, you should the use minimum sized square which encloses your particle.
If you finished picking from your micrographs, you can export your box files with Files → Write box files.
Create a new directory called train_annotation
and save it there. Close boxmanager.
Now create a third folder with the name train_image
. Now for each box file, copy the corresponding image from full_data
into train_image
6). crYOLO will detect image / box file pairs by taking the box file and searching for an image filename which contains the box filename.
You can use crYOLO either by command line or by using the GUI. The GUI should be easier for most users. You can start it with:
cryolo_gui.py
The crYOLO GUI is essentially a visualization of the command line interface. On left side, you find all possible “Actions”:
Each action has several parameters which are organized in tabs. Once you have chosen your settings you can press [Start] (just as example, don't press it now ), the command will be applied and crYOLO shows you the output:
It will tell you if something went wrong. Moreover, it will tell you all parameters used. Pressing [Back] brings you back to your settings, where you can either edit the settings (in case something went wrong) or go to the next action.
You can use almost the same configuration as used when training from scratch. You just have to tell crYOLO to use the latest general model7) by pointing to it with the “pretrained_weights” options:
"train": { [...] "pretrained_weights": "LATEST_GENERAL_MODEL.h5", [...] "saved_weights_name": "my_refined_model.h5", [...] }
In comparison to the training from scratch, you can skip the warm up training ( -w 0 ). Moreover you have to add the --fine_tune flag to tell crYOLO that it should do fine tuning. You can also tell crYOLO how many layers it should fine tune (default is two layers with -lft 2 ):
cryolo_train.py -c config.json -w 0 -g 0 --fine_tune -lft 2
The fine tune mode is especially useful if you want to train crYOLO on the CPU. On my local machine it reduced the time for training cryolo on 14 micrographs from 12-15 hours to 4-5 hours.
Select the action “predict” and fill all arguments in the “Required arguments” tab:
In crYOLO, all particles have an assigned confidence value. By default, all particles with a confidence value below 0.3 are discarded. If you want to pick less or more conservatively you might want to change this confidence threshold to a less (e.g. 0.2) or more (e.g. 0.4) conservative value in the “Optional arguments” tab.
However, it is much easier to select the best threshold after picking using the CBOX
files written by crYOLO as described in the next section.
Monitor mode
When this option is activated, crYOLO will monitor your input folder. This especially useful for automation purposes. You can stop the monitor mode by writing an empty file with the name “stop.cryolo” in the input directory. Just add –monitor in the command line or check the monitor box in in the “Optional arguments” tab.
After picking is done, you can find four folders in your specified output folder:
Click to display ⇲
Click to hide ⇱
To pick all your images in the directory full_data
with the model weight file cryolo_model.h5
(e.g. or gmodel_phosnet_X_Y.h5
when using the general model) and and a confidence threshold of 0.3 run::
cryolo_predict.py -c config.json -w cryolo_model.h5 -i full_data/ -g 0 -o boxfiles/ -t 0.3
You will find the picked particles in the directory boxfiles
.
Since version 1.1.0 crYOLO supports picking filaments.
Filament mode on Actin:
Filament mode on MAVS (EMPIAR-10031) :
The first step is to create the training data for your model. Right now, you have to use the e2helixboxer.py for this:
e2helixboxer.py --gui my_images/*.mrc
After tracing your training data in e2helixboxer, export them using File → Save. Make sure that you export particle coordinates as this the only format supported right now (see screenshot). In the following example, it is expected that you exported into a folder called “train_annotation”.
You now have to create a configuration file for your picking project. It contains all important constants and paths and helps you to reproduce your results later on.
You can either use the command line to create the configuration file or the GUI. For most users, the GUI should be easier. Select the config action and fill in the general fields:
At this point you could already press the [Start] button to generate the config file but you might want to take these options into account:
Since crYOLO 1.4 you can also use neural network denoising with JANNI. The easiest way is to use the JANNI's general model (Download here) but you can also train JANNI for your data. crYOLO directly uses an interface to JANNI to filter your data, you just have to change the filter argument in the Denoising tab from LOWPASS to JANNI and specify the path to your JANNI model: I recommend to use denoising with JANNI only together with a GPU as it is rather slow (~ 1-2 seconds per micrograph on the GPU and 10 seconds per micrograph on the CPU)
You can also modify all options and parameters directly in the config.json file. It can be opened by any text editor. Please note the wiki entry about the crYOLO configuration file if you want to know more details.
In principle, there is not much difference in training crYOLO for filament picking and particle picking. For project with roughly 20 filaments per image we successfully trained on 40 images (⇒ 800 filaments). However, in our experience the warm-up phase and training need a little bit more time:
Train your network with 10 warm up epochs:
cryolo_train.py -c config.json -w 10 -g 0 -e 10
The final model will be called model.h5
The biggest difference in picking filaments with crYOLO is during prediction. However, there are just three additional parameters needed:
Let's assume you want to pick a filament with a width of 100 pixels (-fw 100). The box size is 200×200 and you want a 90% overlap (-bd 20). Moreover, you wish that each filament has at least 6 boxes (-mn 6). The micrographs are in the full_data
directory. Than the picking command would be:
cryolo_predict.py -c config.json -w model.h5 -i full_data --filament -fw 100 -bd 20 -o boxes/ -g 0 -mn 6
The directory boxes
will be created and all results are saved there. The format is the eman2 helix format with particle coordinates. You can find a detailed description how to import crYOLO filament coordinates into Relion here.
To visualize your results you can use the boxmanager:
As image_dir you select the full_data
directory. As box_dir you select the CBOX
folder (or EMAN_HELIX_SEGMENTED
in case of filaments).
CBOX files contain besides the particle coordinates more information like the confidence and the estimated size of each particle. When importing .cbox files into the box manager, it enables more filtering options in the GUI. You can plot size- and confidence distributions. Moreover, you can change the confidence threshold, minimum and maximum size and see the results in a live preview. If you are done with the filtering, you can then write the new box selection into new box files. The video below shows an example.
The evaluation tool allows you, based on your validation data, to get statistics about your training. If you followed the tutorial, the validation data are selected randomly. With crYOLO 1.1.0 a run file for each training is created and saved into the folder runfiles/ in your project directory. This run file contains which files were selected for validation, and you can run your evaluation as follows:
cryolo_evaluation.py -c config.json -w model.h5 -r runfiles/run_YearMonthDay-HourMinuteSecond.json -g 0
The table contains several statistics:
If the training data consists of multiple folders, then evaluation will be done for each folder separately. Furthermore, crYOLO estimates the optimal picking threshold regarding the F1 Score and F2 Score. Both are basically average values of the recall and prediction, whereas the F2 score puts more weights on the recall, which is in the cryo-em often more important.
During training (cryolo_train), there are the following advanced parameters:
During picking (cryolo_predict), there are these advanced parameters:
Any questions? Problems? Suggestions?
Find help at our mailing list!
touch STOP.CRYOLO