User Tools

Site Tools


pipeline:window:cryolo
This version is outdated by a newer approved version.DiffThis version (2019/09/17 13:38) is a draft.
Approvals: 0/1
The Previously approved version (2019/07/29 16:02) is available.Diff

This is an old revision of the document!


Overview

CrYOLO is a fast and accurate particle picking procedure. It's based on convolutional neural networks and utilizes the popular You Only Look Once (YOLO) object detection system.

  • crYOLO makes picking fast – On a modern GPU it will pick your particles at up to 6 micrographs per second.
  • crYOLO makes picking smart – The network learns the context of particles (e.g. not to pick particles on carbon or within ice contamination )
  • crYOLO makes training easy – You might use a general network model and skip training completely. However, if the general model doesn't give you satisfactory results or if you would like to improve them, you might want to train a specialized model specific for your data set by selecting particles (no selection of negative examples necessary) on a small number of micrographs.
  • crYOLO makes training tolerant – Don't worry if you miss quite a lot particles during creation of your training set. crYOLO will still do the job.

In this tutorial we explain our recommended configurations for single particle and filament projects. You can find more information how to use crYOLO, about supported networks and about the config file in the following articles:

You can find more technical details in our paper:

Wagner, T. et al. SPHIRE-crYOLO is a fast and accurate fully automated particle picker for cryo-EM. Communications Biology 2, (2019).


We are also proud that crYOLO was recommended by F1000:

“CrYOLO works amazingly well in identifying the true particles and distinguishing them from other high-contrast features. Thus, crYOLO provides a fast, automated tool, which gives similar reliable results as careful manual selection and outperforms template based selection procedures.”

Access the recommendation on F1000Prime            Bettina Böttcher, Biochemistry, University Würzburg

Installation

You can find the download and installation instructions here: Download and Installation

Tutorials

Depending what you want to do, you can follow one of these self-contained Tutorials:

  1. I would like to train a model from scratch for picking my particles
  2. I would like to train a model from scratch for picking filaments.
  3. I would like to refine a general model for my particles.

The first and the second tutorial are the most common use cases and well tested. The third tutorial is still experimental but might give you better results in less time and less training data.

Picking particles - Using a model trained for your data

This tutorial explains you how to train a model specific for you dataset.

If you followed the installation instructions, you now have to activate the cryolo virtual environment with

source activate cryolo

Data preparation

In the following I will assume that your image data is in the folder full_data.

The next step is to create training data. To do so, we have to pick single particles manually in several micrographs. Ideally, the micrographs are picked to completion. However, it is not necessary to pick all particles. crYOLO will still converge if you miss some (or even many).

How many micrographs have to be picked?

It depends! Typically 10 micrographs are a good start. However, that number may increase / decrease due to several factors:

  • A very heterogenous background could make it necessary to pick more micrographs.
  • When you refine a general model, you might need to pick fewer micrographs.
  • If your micrograph is only sparsely decorated, you may need to pick more micrographs.

We recommend that you start with 10 micrographs, then autopick your data, check the results and finally decide whether to add more micrographs to your training set. If you refine a general model, even 5 micrographs might be enough.

To create your training data, crYOLO is shipped with a tool called “boxmanager”. However, you can also use tools like e2boxer to create your training data.

Start the box manager with the following command:

cryolo_boxmanager.py

Now press File → Open image folder and the select the full_data directory. The first image should pop up. You can navigate in the directory tree through the images. Here is how to pick particles:

  • LEFT MOUSE BUTTON: Place a box
  • HOLD LEFT MOUSE BUTTON: Move a box
  • CONTROL + LEFT MOUSE BUTTON: Remove a box
  • “h” KEY: Toggle to make boxes invisible / visible

You might want to run a low pass filter before you start picking the particles. Just press the [Apply] button to get a low pass filtered version of your currently selected micrograph. An absolute frequency cut-off of 0.1. The allowed values are 0 - 0.5. Lower values means stronger filtering.

You can change the box size in the main window, by changing the number in the text field labeled Box size:. Press [Set] to apply it to all picked particles. For picking, you should the use minimum sized square which encloses your particle.

If you finished picking from your micrographs, you can export your box files with Files → Write box files. Create a new directory called train_annotation and save it there. Close boxmanager.

Now create a third folder with the name train_image. Now for each box file, copy the corresponding image from full_data into train_image1). crYOLO will detect image / box file pairs by taking the box file and searching for an image filename which contains the box filename.

2019/09/15 09:55 · twagner

Start crYOLO

You can use crYOLO either by command line or by using the GUI. The GUI should be easier for most users. You can start it with:

cryolo_gui.py

The crYOLO GUI is essentially a visualization of the command line interface. On left side, you find all possible “Actions”:

  • config: With this action you create the configuration file that you need to run crYOLO.
  • train: This action lets you train crYOLO from scratch or refine an existing model.
  • predict: If you have a (pre)trained model you can pick particles in your data set using this command.
  • evaluation: This action helps you to quantify the quality of your model.
  • boxmanager: This action starts the cryolo boxmanager. You can visulize the picked particles with it or create training data.

Each action has several parameters which are organized in tabs. Once you have chosen your settings you can press [Start] (just as example, don't press it now ;-)), the command will be applied and crYOLO shows you the output:

It will tell you if something went wrong. Moreover, it will tell you all parameters used. Pressing [Back] brings you back to your settings, where you can either edit the settings (in case something went wrong) or go to the next action.

2019/09/13 20:00 · twagner

Configuration

You now have to create a configuration file for your picking project. It contains all important constants and paths and helps you to reproduce your results later on.

You can either use the command line to create the configuration file or the GUI. For most users, the GUI should be easier. Select the config action and fill in the general fields:

At this point you could already press the [Start] button to generate the config file but you might want to take these options into account:

  • During training, crYOLO also needs validation data2). Typically, 20% of the training data are randomly chosen as validation data. If you want to use specific images as validation data, you can move the images and the corresponding box files to separate folders. Make sure that they are removed from the original training folder! You can then specify the new validation folders in “Validation configuration” tab.
  • By default, your micrographs are low pass filtered to an absolute frequency of 0.1 and saved to disk. You can change the cutoff threshold and the directory for filtered micrographs in the “Denoising options” tab.
  • When training from scratch, crYOLO is initialized with weights learned on the ImageNet training data (transfer learning3)). However, it might improve the training if you set the pretrained_weights options in the “Training options” tab to the current general model. Please note, doing this you don't fine tune the network, you just change the initial model initialization.
Alternative: Using neural-network denoising with JANNI

Since crYOLO 1.4 you can also use neural network denoising with JANNI. The easiest way is to use the JANNI's general model (Download here) but you can also train JANNI for your data. crYOLO directly uses an interface to JANNI to filter your data, you just have to change the filter argument in the Denoising tab from LOWPASS to JANNI and specify the path to your JANNI model: I recommend to use denoising with JANNI only together with a GPU as it is rather slow (~ 1-2 seconds per micrograph on the GPU and 10 seconds per micrograph on the CPU)

Editing the configuration file

You can also modify all options and parameters directly in the config.json file. It can be opened by any text editor. Please note the wiki entry about the crYOLO configuration file if you want to know more details.

2019/09/14 23:57 · twagner

You can now press the Start button to create you configuration file.

Click to display ⇲

Click to hide ⇱

To create a basic configuration file that will work for most projects is very simple. I assume your box files for training are in the folder train_annot and the corresponding images in train_image. I furthermore assume that your box size in your box files is 160. To create the config config_cryolo.json simply run:

cryolo_gui.py config config_cryolo.json 160 --train_image_folder train_image --train_annot_folder train_annot

To get a full description of all available options type:

cryolo_gui.py config -h

If you want to specify separate validation folders you can use the --valid_image_folder and --valid_annot_folder options:

cryolo_gui.py config config_cryolo.json 160 --train_image_folder train_image --train_annot_folder train_annot --valid_image_folder valid_img --valid_annot_folder valid_annot
2019/09/17 10:32 · twagner

Training

Now you are ready to train the model. In case you have multiple GPUs, you should first select a free GPU. The following command will show the status of all GPUs:

nvidia-smi

For this tutorial, we assume that you have either a single GPU or want to use GPU 0.

Use a different or multiple GPUs

In the “Optional arguments” tab you can change the GPU that should be used by crYOLO. If you have multiple GPUs (e.g. nvidia-smi lists GPU 0 and GPU 1) you can also use both by setting the GPU argument to '0 1'.

In the GUI you have to fill in the mandatory fields:

The default number of warmup epochs4) is fine as long as you don't want to refine an existing model. During the warmup training epochs it will not try to estimate the size of your particle, which helps crYOLO to converge.

When does crYOLO stop the training?

When you start the training, it will stop when the “loss” metric on the validation data does not improve 10 times in a row. This is typically enough. In case you want to give the training more time to find the best model can increase the “not changed in a row” parameter to a higher value by setting the early argument in the “Optional arguments” to, for example, 15.

The final model will be written to disk as specified in saved_weights_name in your configuration file.

► Now press the [Start] button to start the training.

Alternative: Train crYOLO using the command line

Click to display ⇲

Click to hide ⇱

Navigate to the folder with config_cryolo.json file, train_image folder, etc.

Train your network with 5 warmup epochs in GPU 0:

cryolo_train.py -c config_cryolo.json -w 5 -g 0

The final model file will be written to disk.

2019/09/17 11:10 · twagner

Picking

Select the action “predict” and fill all arguments in the “Required arguments” tab:

Adjusting confidence threshold

In crYOLO, all particles have an assigned confidence value. By default, all particles with a confidence value below 0.3 are discarded. If you want to pick less or more conservatively you might want to change this confidence threshold to a less (e.g. 0.2) or more (e.g. 0.4) conservative value in the “Optional arguments” tab. However, it is much easier to select the best threshold after picking using the CBOX files written by crYOLO as described in the next section.

Monitor mode

When this option is activated, crYOLO will monitor your input folder. This especially useful for automation purposes. You can stop the monitor mode by writing an empty file with the name “stop.cryolo” in the input directory. Just add –monitor in the command line or check the monitor box in in the “Optional arguments” tab.

After picking is done, you can find four folders in your specified output folder:

  • CBOX: Contains a coordinate file in .cbox format each input micrograph. It contains all detected particles, even those with a confidence lower the selected confidence threshold. Additionally it contains the confidence and the estimated diameter for each particle. Importing those files into the boxmanager allows you advanced filtering e.g. according size or confidence.
  • EMAN: Contains a coordinate file in .box format each input micrograph. Only particles with the an confidence higher then the selected (default: 0.3) are contained in those files.
  • STAR: Contains a coordinate file in .star format each input micrograph. Only particles with the an confidence higher then the selected (default: 0.3) are contained in those files.
  • DISTR: Contains the plots of confidence- and size-distribution. Moroever, it contains a machine readable text-file the summary statistics about these distributions and their raw data in separate text-files.

► Press the the [Start] button to run the prediction.

Alternative: Run prediction from the command line

Click to display ⇲

Click to hide ⇱

To pick all your images in the directory full_data with the model weight file cryolo_model.h5 (e.g. or gmodel_phosnet_X_Y.h5 when using the general model) and and a confidence threshold of 0.3 run::

cryolo_predict.py -c config.json -w cryolo_model.h5 -i full_data/ -g 0 -o boxfiles/ -t 0.3

You will find the picked particles in the directory boxfiles.

2019/09/14 10:12 · twagner

Visualize the results

To visualize your results you can use the boxmanager:

As image_dir you select the full_data directory. As box_dir you select the CBOX folder (or EMAN_HELIX_SEGMENTED in case of filaments).

The following does not yet work for filaments.

CBOX files contain besides the particle coordinates more information like the confidence and the estimated size of each particle. When importing .cbox files into the box manager, it enables more filtering options in the GUI. You can plot size- and confidence distributions. Moreover, you can change the confidence threshold, minimum and maximum size and see the results in a live preview. If you are done with the filtering, you can then write the new box selection into new box files. The video below shows an example.

This example shows how to filter particle boxes using the crYOLO boxmanager. It is an animated GIF. Click on it to see it playing.
2019/09/14 10:18 · twagner

Evaluate your results

The evaluation tool allows you, based on your validation micrographs, to get statistics about the success of your training.

To understand the outcome, you have to know what precision and recall is. Here is good figure from wikipedia:

Another important measure is the F1 (β=1) and F2 (β=2) score:

Precision metric can be misleading

If your validation micrographs are not labeled to completion the precision value will be misleading. crYOLO will start picking the remaining 'unlabeled' particles, but for statistics they are counted as false-positive (as the software takes your labeled data as ground truth).

If you followed the tutorial, the validation data are selected randomly. A run file for each training is created and saved into the folder runfiles/ in your project directory. These runfiles are .json files containing information about what micrographs were selected for validation. To calculate evaluation metrics select the evaluation action.

Fill out the fields in the “Required arguments” tab:

► Press [Start] to calculate the evaluation results.

Alternative: Run evaluation from the command line

Click to display ⇲

Click to hide ⇱

cryolo_evaluation.py -c config.json -w model.h5 -r runfiles/run_YearMonthDay-HourMinuteSecond.json -g 0

The html file you specified as output looks like this:

The table contains several statistics:

  • AUC: Area under curve of the precision-recall curve. Overall summary statistics. Perfect classifier = 1, Worst classifier = 0
  • Topt: Optimal confidence threshold with respect to the F1 score. It might not be ideal for your picking, as the F1 score weighs recall and precision equally. In single particle analysis, recall is often more important than the precision.
  • R (Topt): Recall using the optimal confidence threshold.
  • R (0.3): Recall using a confidence threshold of 0.3.
  • R (0.2): Recall using a confidence threshold of 0.2.
  • P (Topt): Precision using the optimal confidence threshold.
  • P (0.3): Precision using a confidence threshold of 0.3.
  • P (0.2): Precision using a confidence threshold of 0.2.
  • F1 (Topt): Harmonic mean of precision and recall using the optimal confidence threshold.
  • F1 (0.3): Harmonic mean of precision and recall using a confidence threshold of 0.3.
  • F1 (0.2): Harmonic mean of precision and recall using a confidence threshold of 0.2.
  • IOU (Topt): Intersection over union of the auto-picked particles and the corresponding ground-truth boxes. The higher, the better – evaluated with the optimal confidence threshold.
  • IOU (0.3): Intersection over union of the auto-picked particles and the corresponding ground-truth boxes. The higher, the better – evaluated with a confidence threshold of 0.3.
  • IOU (0.2): Intersection over union of the auto-picked particles and the corresponding ground-truth boxes. The higher, the better – evaluated with a confidence threshold of 0.2.

If the training data consist of multiple folders, then evaluation will be done for each folder separately. Furthermore, crYOLO estimates the optimal picking threshold regarding the F1 Score and F2 Score. Both are basically average values of the recall and prediction, whereas the F2 score puts more weights on the recall, which is in cryo-EM often more important.

2019/09/17 10:05 · twagner

Picking particles - Without training using a general model

Here you can find how to apply the general models we trained for you. If you would like to train your own general model, please see our extra wiki page: How to train your own general model

Our general models can be found and downloaded here: Download and Installation.

If you followed the installation instructions, you now have to activate the cryolo virtual environment with

source activate cryolo

Start crYOLO

You can use crYOLO either by command line or by using the GUI. The GUI should be easier for most users. You can start it with:

cryolo_gui.py

The crYOLO GUI is essentially a visualization of the command line interface. On left side, you find all possible “Actions”:

  • config: With this action you create the configuration file that you need to run crYOLO.
  • train: This action lets you train crYOLO from scratch or refine an existing model.
  • predict: If you have a (pre)trained model you can pick particles in your data set using this command.
  • evaluation: This action helps you to quantify the quality of your model.
  • boxmanager: This action starts the cryolo boxmanager. You can visulize the picked particles with it or create training data.

Each action has several parameters which are organized in tabs. Once you have chosen your settings you can press [Start] (just as example, don't press it now ;-)), the command will be applied and crYOLO shows you the output:

It will tell you if something went wrong. Moreover, it will tell you all parameters used. Pressing [Back] brings you back to your settings, where you can either edit the settings (in case something went wrong) or go to the next action.

2019/09/13 20:00 · twagner

Configuration

In the GUI choose the config action. Fill in your target box size and leave the train_image_folder and train_annot_folder fields empty.

There are three general models available. It is important that you choose the same filtering options in “Model/denoising options” tab as we did during training the general models:

  • General model trained for low-pass filtered images : Select filter “LOWPASS” and low_pass_cutoff of 0.1
  • General model trained for JANNI-denoised images: Select filter “JANNI” and the janni general model for janni_model. Keep the defaults for janni_overlap and janni_batches
  • General model for negative stain images: Select filter “NONE”

Press the start button to write the configuration file to disk.

Create the configuration file using the command line

Create the configuration file using the command line

In the following I assume that you target box size is 220. Please adapt if necessary.

For the general Phosaurus network trained for low-pass filtered cryo images run:

cryoloo.py config config_cryolo_.json 220 --filter LOWPASS --low_pass_cutoff 0.1

For the general model trained with neural-network denoised cryo images (with JANNI's general model) run:

cryoloo.py config config_cryolo_.json 220 --filter JANNI --janni_model /path/to/janni_general_model.h5

For the general model for negative stain data please run:

cryoloo.py config config_cryolo_.json 220 --filter NONE

Picking

Select the action “predict” and fill all arguments in the “Required arguments” tab:

Adjusting confidence threshold

In crYOLO, all particles have an assigned confidence value. By default, all particles with a confidence value below 0.3 are discarded. If you want to pick less or more conservatively you might want to change this confidence threshold to a less (e.g. 0.2) or more (e.g. 0.4) conservative value in the “Optional arguments” tab. However, it is much easier to select the best threshold after picking using the CBOX files written by crYOLO as described in the next section.

Monitor mode

When this option is activated, crYOLO will monitor your input folder. This especially useful for automation purposes. You can stop the monitor mode by writing an empty file with the name “stop.cryolo” in the input directory. Just add –monitor in the command line or check the monitor box in in the “Optional arguments” tab.

After picking is done, you can find four folders in your specified output folder:

  • CBOX: Contains a coordinate file in .cbox format each input micrograph. It contains all detected particles, even those with a confidence lower the selected confidence threshold. Additionally it contains the confidence and the estimated diameter for each particle. Importing those files into the boxmanager allows you advanced filtering e.g. according size or confidence.
  • EMAN: Contains a coordinate file in .box format each input micrograph. Only particles with the an confidence higher then the selected (default: 0.3) are contained in those files.
  • STAR: Contains a coordinate file in .star format each input micrograph. Only particles with the an confidence higher then the selected (default: 0.3) are contained in those files.
  • DISTR: Contains the plots of confidence- and size-distribution. Moroever, it contains a machine readable text-file the summary statistics about these distributions and their raw data in separate text-files.

► Press the the [Start] button to run the prediction.

Alternative: Run prediction from the command line

Click to display ⇲

Click to hide ⇱

To pick all your images in the directory full_data with the model weight file cryolo_model.h5 (e.g. or gmodel_phosnet_X_Y.h5 when using the general model) and and a confidence threshold of 0.3 run::

cryolo_predict.py -c config.json -w cryolo_model.h5 -i full_data/ -g 0 -o boxfiles/ -t 0.3

You will find the picked particles in the directory boxfiles.

2019/09/14 10:12 · twagner

Visualize the results

To visualize your results you can use the boxmanager:

As image_dir you select the full_data directory. As box_dir you select the CBOX folder (or EMAN_HELIX_SEGMENTED in case of filaments).

The following does not yet work for filaments.

CBOX files contain besides the particle coordinates more information like the confidence and the estimated size of each particle. When importing .cbox files into the box manager, it enables more filtering options in the GUI. You can plot size- and confidence distributions. Moreover, you can change the confidence threshold, minimum and maximum size and see the results in a live preview. If you are done with the filtering, you can then write the new box selection into new box files. The video below shows an example.

This example shows how to filter particle boxes using the crYOLO boxmanager. It is an animated GIF. Click on it to see it playing.
2019/09/14 10:18 · twagner

Picking particles - Using the general model refined for your data

Since crYOLO 1.3 you can train a model for your data by fine-tuning the general model.

What does fine-tuning mean?

The general model was trained on a lot of particles with a variety of shapes and therefore learned a very good set of generic features. The last layers, however, learn a pretty abstract representation of the particles and it might be that they do not perfectly fit for your particle at hand. Fine-tuning only traines the last two convolutional layers, but keep the others fixed. This adjusts the more abstract representation for your specific problem.

Why should I fine-tune my model instead of training from scratch?

  1. From theory, using fine-tuning should reduce the risk of overfitting 5) and the amount of training data.
  2. The training is much faster, as not all layers have to be trained.
  3. The training will need less GPU memory 6) and therefore is usable with NVIDIA cards with less memory.

However, the fine tune mode is still somewhat experimental and we will update this section if see more advantages or disadvantages.

If you followed the installation instructions, you now have to activate the cryolo virtual environment with

source activate cryolo

Data preparation

In the following I will assume that your image data is in the folder full_data.

The next step is to create training data. To do so, we have to pick single particles manually in several micrographs. Ideally, the micrographs are picked to completion. However, it is not necessary to pick all particles. crYOLO will still converge if you miss some (or even many).

How many micrographs have to be picked?

It depends! Typically 10 micrographs are a good start. However, that number may increase / decrease due to several factors:

  • A very heterogenous background could make it necessary to pick more micrographs.
  • When you refine a general model, you might need to pick fewer micrographs.
  • If your micrograph is only sparsely decorated, you may need to pick more micrographs.

We recommend that you start with 10 micrographs, then autopick your data, check the results and finally decide whether to add more micrographs to your training set. If you refine a general model, even 5 micrographs might be enough.

To create your training data, crYOLO is shipped with a tool called “boxmanager”. However, you can also use tools like e2boxer to create your training data.

Start the box manager with the following command:

cryolo_boxmanager.py

Now press File → Open image folder and the select the full_data directory. The first image should pop up. You can navigate in the directory tree through the images. Here is how to pick particles:

  • LEFT MOUSE BUTTON: Place a box
  • HOLD LEFT MOUSE BUTTON: Move a box
  • CONTROL + LEFT MOUSE BUTTON: Remove a box
  • “h” KEY: Toggle to make boxes invisible / visible

You might want to run a low pass filter before you start picking the particles. Just press the [Apply] button to get a low pass filtered version of your currently selected micrograph. An absolute frequency cut-off of 0.1. The allowed values are 0 - 0.5. Lower values means stronger filtering.

You can change the box size in the main window, by changing the number in the text field labeled Box size:. Press [Set] to apply it to all picked particles. For picking, you should the use minimum sized square which encloses your particle.

If you finished picking from your micrographs, you can export your box files with Files → Write box files. Create a new directory called train_annotation and save it there. Close boxmanager.

Now create a third folder with the name train_image. Now for each box file, copy the corresponding image from full_data into train_image7). crYOLO will detect image / box file pairs by taking the box file and searching for an image filename which contains the box filename.

2019/09/15 09:55 · twagner

Start crYOLO

You can use crYOLO either by command line or by using the GUI. The GUI should be easier for most users. You can start it with:

cryolo_gui.py

The crYOLO GUI is essentially a visualization of the command line interface. On left side, you find all possible “Actions”:

  • config: With this action you create the configuration file that you need to run crYOLO.
  • train: This action lets you train crYOLO from scratch or refine an existing model.
  • predict: If you have a (pre)trained model you can pick particles in your data set using this command.
  • evaluation: This action helps you to quantify the quality of your model.
  • boxmanager: This action starts the cryolo boxmanager. You can visulize the picked particles with it or create training data.

Each action has several parameters which are organized in tabs. Once you have chosen your settings you can press [Start] (just as example, don't press it now ;-)), the command will be applied and crYOLO shows you the output:

It will tell you if something went wrong. Moreover, it will tell you all parameters used. Pressing [Back] brings you back to your settings, where you can either edit the settings (in case something went wrong) or go to the next action.

2019/09/13 20:00 · twagner

Configuration

You now have to create a configuration file for your picking project. It contains all important constants and paths and helps you to reproduce your results later on.

You can either use the command line to create the configuration file or the GUI. For most users, the GUI should be easier. Select the config action and fill in the general fields:

At this point you could already press the [Start] button to generate the config file but you might want to take these options into account:

  • During training, crYOLO also needs validation data8). Typically, 20% of the training data are randomly chosen as validation data. If you want to use specific images as validation data, you can move the images and the corresponding box files to separate folders. Make sure that they are removed from the original training folder! You can then specify the new validation folders in “Validation configuration” tab.
  • By default, your micrographs are low pass filtered to an absolute frequency of 0.1 and saved to disk. You can change the cutoff threshold and the directory for filtered micrographs in the “Denoising options” tab.
  • When training from scratch, crYOLO is initialized with weights learned on the ImageNet training data (transfer learning9)). However, it might improve the training if you set the pretrained_weights options in the “Training options” tab to the current general model. Please note, doing this you don't fine tune the network, you just change the initial model initialization.
Alternative: Using neural-network denoising with JANNI

Since crYOLO 1.4 you can also use neural network denoising with JANNI. The easiest way is to use the JANNI's general model (Download here) but you can also train JANNI for your data. crYOLO directly uses an interface to JANNI to filter your data, you just have to change the filter argument in the Denoising tab from LOWPASS to JANNI and specify the path to your JANNI model: I recommend to use denoising with JANNI only together with a GPU as it is rather slow (~ 1-2 seconds per micrograph on the GPU and 10 seconds per micrograph on the CPU)

Editing the configuration file

You can also modify all options and parameters directly in the config.json file. It can be opened by any text editor. Please note the wiki entry about the crYOLO configuration file if you want to know more details.

2019/09/14 23:57 · twagner

Furthermore, you have to select the model you want to refine. Download the the general model you want to refine specify in the field pretrained_weights in the “Training options” tab.

You can now press the Start button to create configuration file.

Create the configuration file using the command line:

Create the configuration file using the command line:

I assume your box files for training are in the folder train_annotation and the corresponding images in train_image. I furthermore assume that your box size in your box files is 160 and the model you want to refine is gmodel_phosnet_20190516.h5. To create the config config_cryolo.json simply run:

cryoloo.py config config_cryolo.json 160 --train_image_folder train_image --train_annot_folder train_annot --pretrained_weights gmodel_phosnet_20190516.h5

To get a full description of all available options type:

cryoloo.py config -h

If you want to specify seperate validation folders you can use the --valid_image_folder and --valid_annot_folder options:

cryoloo.py config config_cryolo.json 160 --train_image_folder train_image --train_annot_folder train_annot --pretrained_weights gmodel_phosnet_20190516.h5 --valid_image_folder valid_img --valid_annot_folder valid_annot 

Training

Now you are ready to train the model. In case you have multiple GPUs, you should first select a free GPU. The following command will show the status of all GPUs:

nvidia-smi

For this tutorial, we assume that you have either a single GPU or want to use GPU 0.

In the GUI choose the action train. In the “Required arguments” tab select the configuration file we created in the previous step and set the number of warmup periods to zero.

In the “Optional arguments” tab please check the fine_tune box.

The number of layers to fine tune (specified by layers_fine_tune in the “Optional arguments” tab) is still experimental. The default value of 2 worked for us but you might need more layers..
Training on CPU

The fine tune mode is especially useful if you want to train crYOLO on the CPU. On my local machine it reduced the time for training cryolo on 14 micrographs from 12-15 hours to 4-5 hours.

Run training with the command line

Run training with the command line

In comparison to the training from scratch, you can skip the warm up training ( -w 0 ). Moreover you have to add the --fine_tune flag to tell crYOLO that it should do fine tuning. You can also tell crYOLO how many layers it should fine tune (default is two layers with -lft 2 ):

cryolo_train.py -c config.json -w 0 -g 0 --fine_tune -lft 2

Picking

Select the action “predict” and fill all arguments in the “Required arguments” tab:

Adjusting confidence threshold

In crYOLO, all particles have an assigned confidence value. By default, all particles with a confidence value below 0.3 are discarded. If you want to pick less or more conservatively you might want to change this confidence threshold to a less (e.g. 0.2) or more (e.g. 0.4) conservative value in the “Optional arguments” tab. However, it is much easier to select the best threshold after picking using the CBOX files written by crYOLO as described in the next section.

Monitor mode

When this option is activated, crYOLO will monitor your input folder. This especially useful for automation purposes. You can stop the monitor mode by writing an empty file with the name “stop.cryolo” in the input directory. Just add –monitor in the command line or check the monitor box in in the “Optional arguments” tab.

After picking is done, you can find four folders in your specified output folder:

  • CBOX: Contains a coordinate file in .cbox format each input micrograph. It contains all detected particles, even those with a confidence lower the selected confidence threshold. Additionally it contains the confidence and the estimated diameter for each particle. Importing those files into the boxmanager allows you advanced filtering e.g. according size or confidence.
  • EMAN: Contains a coordinate file in .box format each input micrograph. Only particles with the an confidence higher then the selected (default: 0.3) are contained in those files.
  • STAR: Contains a coordinate file in .star format each input micrograph. Only particles with the an confidence higher then the selected (default: 0.3) are contained in those files.
  • DISTR: Contains the plots of confidence- and size-distribution. Moroever, it contains a machine readable text-file the summary statistics about these distributions and their raw data in separate text-files.

► Press the the [Start] button to run the prediction.

Alternative: Run prediction from the command line

Click to display ⇲

Click to hide ⇱

To pick all your images in the directory full_data with the model weight file cryolo_model.h5 (e.g. or gmodel_phosnet_X_Y.h5 when using the general model) and and a confidence threshold of 0.3 run::

cryolo_predict.py -c config.json -w cryolo_model.h5 -i full_data/ -g 0 -o boxfiles/ -t 0.3

You will find the picked particles in the directory boxfiles.

2019/09/14 10:12 · twagner

Visualize the results

To visualize your results you can use the boxmanager:

As image_dir you select the full_data directory. As box_dir you select the CBOX folder (or EMAN_HELIX_SEGMENTED in case of filaments).

The following does not yet work for filaments.

CBOX files contain besides the particle coordinates more information like the confidence and the estimated size of each particle. When importing .cbox files into the box manager, it enables more filtering options in the GUI. You can plot size- and confidence distributions. Moreover, you can change the confidence threshold, minimum and maximum size and see the results in a live preview. If you are done with the filtering, you can then write the new box selection into new box files. The video below shows an example.

This example shows how to filter particle boxes using the crYOLO boxmanager. It is an animated GIF. Click on it to see it playing.
2019/09/14 10:18 · twagner

Evaluate your results

The evaluation tool allows you, based on your validation micrographs, to get statistics about the success of your training.

To understand the outcome, you have to know what precision and recall is. Here is good figure from wikipedia:

Another important measure is the F1 (β=1) and F2 (β=2) score:

Precision metric can be misleading

If your validation micrographs are not labeled to completion the precision value will be misleading. crYOLO will start picking the remaining 'unlabeled' particles, but for statistics they are counted as false-positive (as the software takes your labeled data as ground truth).

If you followed the tutorial, the validation data are selected randomly. A run file for each training is created and saved into the folder runfiles/ in your project directory. These runfiles are .json files containing information about what micrographs were selected for validation. To calculate evaluation metrics select the evaluation action.

Fill out the fields in the “Required arguments” tab:

► Press [Start] to calculate the evaluation results.

Alternative: Run evaluation from the command line

Click to display ⇲

Click to hide ⇱

cryolo_evaluation.py -c config.json -w model.h5 -r runfiles/run_YearMonthDay-HourMinuteSecond.json -g 0

The html file you specified as output looks like this:

The table contains several statistics:

  • AUC: Area under curve of the precision-recall curve. Overall summary statistics. Perfect classifier = 1, Worst classifier = 0
  • Topt: Optimal confidence threshold with respect to the F1 score. It might not be ideal for your picking, as the F1 score weighs recall and precision equally. In single particle analysis, recall is often more important than the precision.
  • R (Topt): Recall using the optimal confidence threshold.
  • R (0.3): Recall using a confidence threshold of 0.3.
  • R (0.2): Recall using a confidence threshold of 0.2.
  • P (Topt): Precision using the optimal confidence threshold.
  • P (0.3): Precision using a confidence threshold of 0.3.
  • P (0.2): Precision using a confidence threshold of 0.2.
  • F1 (Topt): Harmonic mean of precision and recall using the optimal confidence threshold.
  • F1 (0.3): Harmonic mean of precision and recall using a confidence threshold of 0.3.
  • F1 (0.2): Harmonic mean of precision and recall using a confidence threshold of 0.2.
  • IOU (Topt): Intersection over union of the auto-picked particles and the corresponding ground-truth boxes. The higher, the better – evaluated with the optimal confidence threshold.
  • IOU (0.3): Intersection over union of the auto-picked particles and the corresponding ground-truth boxes. The higher, the better – evaluated with a confidence threshold of 0.3.
  • IOU (0.2): Intersection over union of the auto-picked particles and the corresponding ground-truth boxes. The higher, the better – evaluated with a confidence threshold of 0.2.

If the training data consist of multiple folders, then evaluation will be done for each folder separately. Furthermore, crYOLO estimates the optimal picking threshold regarding the F1 Score and F2 Score. Both are basically average values of the recall and prediction, whereas the F2 score puts more weights on the recall, which is in cryo-EM often more important.

2019/09/17 10:05 · twagner

Picking filaments - Using a model trained for your data

Since version 1.1.0 crYOLO supports picking filaments.

Filament mode on Actin:

Filament mode on MAVS (EMPIAR-10031) :

If you followed the installation instructions, you now have to activate the cryolo virtual environment with

source activate cryolo

Data preparation

The first step is to create the training data for your model. Right now, you have to use the e2helixboxer.py for this:

e2helixboxer.py --gui my_images/*.mrc

After tracing your training data in e2helixboxer, export them using File → Save. Make sure that you export particle coordinates as this the only format supported right now (see screenshot). In the following example, it is expected that you exported into a folder called “train_annotation”.

For projects with roughly 20 filaments per image we successfully trained on 40 images (⇒ 800 filaments).

Start crYOLO

You can use crYOLO either by command line or by using the GUI. The GUI should be easier for most users. You can start it with:

cryolo_gui.py

The crYOLO GUI is essentially a visualization of the command line interface. On left side, you find all possible “Actions”:

  • config: With this action you create the configuration file that you need to run crYOLO.
  • train: This action lets you train crYOLO from scratch or refine an existing model.
  • predict: If you have a (pre)trained model you can pick particles in your data set using this command.
  • evaluation: This action helps you to quantify the quality of your model.
  • boxmanager: This action starts the cryolo boxmanager. You can visulize the picked particles with it or create training data.

Each action has several parameters which are organized in tabs. Once you have chosen your settings you can press [Start] (just as example, don't press it now ;-)), the command will be applied and crYOLO shows you the output:

It will tell you if something went wrong. Moreover, it will tell you all parameters used. Pressing [Back] brings you back to your settings, where you can either edit the settings (in case something went wrong) or go to the next action.

2019/09/13 20:00 · twagner

Configuration

You now have to create a configuration file for your picking project. It contains all important constants and paths and helps you to reproduce your results later on.

You can either use the command line to create the configuration file or the GUI. For most users, the GUI should be easier. Select the config action and fill in the general fields:

At this point you could already press the [Start] button to generate the config file but you might want to take these options into account:

  • During training, crYOLO also needs validation data10). Typically, 20% of the training data are randomly chosen as validation data. If you want to use specific images as validation data, you can move the images and the corresponding box files to separate folders. Make sure that they are removed from the original training folder! You can then specify the new validation folders in “Validation configuration” tab.
  • By default, your micrographs are low pass filtered to an absolute frequency of 0.1 and saved to disk. You can change the cutoff threshold and the directory for filtered micrographs in the “Denoising options” tab.
  • When training from scratch, crYOLO is initialized with weights learned on the ImageNet training data (transfer learning11)). However, it might improve the training if you set the pretrained_weights options in the “Training options” tab to the current general model. Please note, doing this you don't fine tune the network, you just change the initial model initialization.
Alternative: Using neural-network denoising with JANNI

Since crYOLO 1.4 you can also use neural network denoising with JANNI. The easiest way is to use the JANNI's general model (Download here) but you can also train JANNI for your data. crYOLO directly uses an interface to JANNI to filter your data, you just have to change the filter argument in the Denoising tab from LOWPASS to JANNI and specify the path to your JANNI model: I recommend to use denoising with JANNI only together with a GPU as it is rather slow (~ 1-2 seconds per micrograph on the GPU and 10 seconds per micrograph on the CPU)

Editing the configuration file

You can also modify all options and parameters directly in the config.json file. It can be opened by any text editor. Please note the wiki entry about the crYOLO configuration file if you want to know more details.

2019/09/14 23:57 · twagner

You can now press the Start button to create you configuration file.

Click to display ⇲

Click to hide ⇱

To create a basic configuration file that will work for most projects is very simple. I assume your box files for training are in the folder train_annot and the corresponding images in train_image. I furthermore assume that your box size in your box files is 160. To create the config config_cryolo.json simply run:

cryolo_gui.py config config_cryolo.json 160 --train_image_folder train_image --train_annot_folder train_annot

To get a full description of all available options type:

cryolo_gui.py config -h

If you want to specify separate validation folders you can use the --valid_image_folder and --valid_annot_folder options:

cryolo_gui.py config config_cryolo.json 160 --train_image_folder train_image --train_annot_folder train_annot --valid_image_folder valid_img --valid_annot_folder valid_annot
2019/09/17 10:32 · twagner

Training

Now you are ready to train the model. In case you have multiple GPUs, you should first select a free GPU. The following command will show the status of all GPUs:

nvidia-smi

For this tutorial, we assume that you have either a single GPU or want to use GPU 0.

Use a different or multiple GPUs

In the “Optional arguments” tab you can change the GPU that should be used by crYOLO. If you have multiple GPUs (e.g. nvidia-smi lists GPU 0 and GPU 1) you can also use both by setting the GPU argument to '0 1'.

In the GUI you have to fill in the mandatory fields:

The default number of warmup epochs12) is fine as long as you don't want to refine an existing model. During the warmup training epochs it will not try to estimate the size of your particle, which helps crYOLO to converge.

When does crYOLO stop the training?

When you start the training, it will stop when the “loss” metric on the validation data does not improve 10 times in a row. This is typically enough. In case you want to give the training more time to find the best model can increase the “not changed in a row” parameter to a higher value by setting the early argument in the “Optional arguments” to, for example, 15.

The final model will be written to disk as specified in saved_weights_name in your configuration file.

► Now press the [Start] button to start the training.

Alternative: Train crYOLO using the command line

Click to display ⇲

Click to hide ⇱

Navigate to the folder with config_cryolo.json file, train_image folder, etc.

Train your network with 5 warmup epochs in GPU 0:

cryolo_train.py -c config_cryolo.json -w 5 -g 0

The final model file will be written to disk.

2019/09/17 11:10 · twagner

Picking

Select the action prediction and fill all arguments in the “Required arguments” tab:

Now select the “Filament options” tab and check “Activate filament mode”, specifiy the filament width (e.g. 100) and define the box distance (e.g. 20 for 90% overlap when using a box size if 200):

Press the start button to start the picking. The directory output_boxes will be created and all results are saved there. The format is the eman2 helix format with particle coordinates.

You can find a detailed description how to import crYOLO filament coordinates into Relion here.

Run prediction in commmand line

Run prediction in commmand line

Let's assume you want to pick a filament with a width of 100 pixels (-fw 100). The box size is 200×200 and you want a 90% overlap (-bd 20). Moreover, you wish that each filament has at least 6 boxes (-mn 6). The micrographs are in the full_data directory. Than the picking command would be:

cryolo_predict.py -c cryolo_config.json -w cryolo_model.h5 -i full_data --filament -fw 100 -bd 20 -o boxes/ -g 0 -mn 6

Visualize the results

To visualize your results you can use the boxmanager:

As image_dir you select the full_data directory. As box_dir you select the CBOX folder (or EMAN_HELIX_SEGMENTED in case of filaments).

The following does not yet work for filaments.

CBOX files contain besides the particle coordinates more information like the confidence and the estimated size of each particle. When importing .cbox files into the box manager, it enables more filtering options in the GUI. You can plot size- and confidence distributions. Moreover, you can change the confidence threshold, minimum and maximum size and see the results in a live preview. If you are done with the filtering, you can then write the new box selection into new box files. The video below shows an example.

This example shows how to filter particle boxes using the crYOLO boxmanager. It is an animated GIF. Click on it to see it playing.
2019/09/14 10:18 · twagner

Evaluate your results

The evaluation tool allows you, based on your validation micrographs, to get statistics about the success of your training.

To understand the outcome, you have to know what precision and recall is. Here is good figure from wikipedia:

Another important measure is the F1 (β=1) and F2 (β=2) score:

Precision metric can be misleading

If your validation micrographs are not labeled to completion the precision value will be misleading. crYOLO will start picking the remaining 'unlabeled' particles, but for statistics they are counted as false-positive (as the software takes your labeled data as ground truth).

If you followed the tutorial, the validation data are selected randomly. A run file for each training is created and saved into the folder runfiles/ in your project directory. These runfiles are .json files containing information about what micrographs were selected for validation. To calculate evaluation metrics select the evaluation action.

Fill out the fields in the “Required arguments” tab:

► Press [Start] to calculate the evaluation results.

Alternative: Run evaluation from the command line

Click to display ⇲

Click to hide ⇱

cryolo_evaluation.py -c config.json -w model.h5 -r runfiles/run_YearMonthDay-HourMinuteSecond.json -g 0

The html file you specified as output looks like this:

The table contains several statistics:

  • AUC: Area under curve of the precision-recall curve. Overall summary statistics. Perfect classifier = 1, Worst classifier = 0
  • Topt: Optimal confidence threshold with respect to the F1 score. It might not be ideal for your picking, as the F1 score weighs recall and precision equally. In single particle analysis, recall is often more important than the precision.
  • R (Topt): Recall using the optimal confidence threshold.
  • R (0.3): Recall using a confidence threshold of 0.3.
  • R (0.2): Recall using a confidence threshold of 0.2.
  • P (Topt): Precision using the optimal confidence threshold.
  • P (0.3): Precision using a confidence threshold of 0.3.
  • P (0.2): Precision using a confidence threshold of 0.2.
  • F1 (Topt): Harmonic mean of precision and recall using the optimal confidence threshold.
  • F1 (0.3): Harmonic mean of precision and recall using a confidence threshold of 0.3.
  • F1 (0.2): Harmonic mean of precision and recall using a confidence threshold of 0.2.
  • IOU (Topt): Intersection over union of the auto-picked particles and the corresponding ground-truth boxes. The higher, the better – evaluated with the optimal confidence threshold.
  • IOU (0.3): Intersection over union of the auto-picked particles and the corresponding ground-truth boxes. The higher, the better – evaluated with a confidence threshold of 0.3.
  • IOU (0.2): Intersection over union of the auto-picked particles and the corresponding ground-truth boxes. The higher, the better – evaluated with a confidence threshold of 0.2.

If the training data consist of multiple folders, then evaluation will be done for each folder separately. Furthermore, crYOLO estimates the optimal picking threshold regarding the F1 Score and F2 Score. Both are basically average values of the recall and prediction, whereas the F2 score puts more weights on the recall, which is in cryo-EM often more important.

2019/09/17 10:05 · twagner

Advanced parameters

During training (cryolo_train), there are the following advanced parameters:

  • --warm_restarts: With this option the learning rate is decreasing after each epoch and then reset after a couple of epochs.
  • --num_cpu NUMBER_OF_CPUS: Number of CPU cores used during training
  • --gpu_fraction FRACTION: Number between 0 - 1 quantifying the fraction of GPU memory that is reserved by crYOLO
  • --skip_augmentation: Set this flaq, if crYOLO should skip the data augmentation (not recommended).
  • --fine_tune: With this flag, crYOLO will only train the last layers (fine tune)
  • -lft NUM_LAYER_FINETUNE: Numbers of layers to fine tune (default is 2).

During picking (cryolo_predict), there are these advanced parameters:

  • -t CONFIDENCE_THRESHOLD: With the -t parameter, you can let the crYOLO pick more conservative (e.g by adding -t 0.4 to the picking command) or less conservative (e.g by adding -t 0.2 to the picking command). The valid parameter range is 0 to 1.
  • -d DISTANCE_IN_PIXEL: With the -d parameter you can filter your picked particles. Boxes with a distance (pixel) less than this value will be removed.
  • -pbs PREDICTION_BATCH_SIZE: With the -pbs parameter you can set the number of images picked as batch. Default is 3.
  • --otf: Instead of saving the filtered images into an seperate directory, crYOLO will filter them on-the-fly and don't write them to disk.
  • --num_cpu NUMBER_OF_CPUS: Number of CPU cores used during prediction
  • --gpu_fraction FRACTION: Number between 0 -1 quantifying the fraction of GPU memory that is reserved by crYOLO.
  • --monitor: With this flaq, crYOLO will monitor your input directory and pick images as they appear in the directory. The monitor mode can be stopped by writing the empty file STOP.CRYOLO 13) into the input directory.
  • -sr SEARCH_RANGE_FACTOR: (FILAMENT MODE) The search range for connecting boxes is the box size times this factor. Default is 1.41

Help

Any questions? Problems? Suggestions?

Find help at our mailing list!

1) , 7)
While it is nice to keep your files organized, you don't have to copy your training images into a separate folder. In the configuration file (see below) you can also simply specify the full_data directory as “train_image_folder”. CrYOLO will find the correct images using the box files.
2) , 8) , 10)
Micrographs that are selected as validation data are not used to train crYOLO. These micrographs are used to calculate how well the model performs and whether it still improves.
3) , 9) , 11)
From Wikipedia: Transfer learning (TL) is a research problem in machine learning (ML) that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem.
4) , 12)
One epoch is a complete pass through the training data.
5)
Overfitting means, that the model works good on the training micrographs, but not on new unseen micrographs. The model just memorized what it saw instead of learning generic features.
6)
We are testing crYOLO with its default configuration on graphic cards with >= 8 GB memory. Using the fine tune mode, it should also work with GPUs with 4 GB memory
13)
you can create it with
touch STOP.CRYOLO
pipeline/window/cryolo.1568720312.txt.gz · Last modified: 2019/09/17 13:38 by twagner