This version is outdated by a newer approved version.This version (2019/09/13 11:17) is a draft.
Approvals: 0/1
The Previously approved version (2019/07/29 16:13) is available.

This is an old revision of the document!

• Author: Thorsten Wagner
• Last Update: 2018-06-20

# Paper

You can find more technical details in our paper:

Nature Communications Biology: SPHIRE-crYOLO is a fast and accurate fully automated particle picker for cryo-EM

Version: 1.5.0

Please see install instructions how to get it running on the CPU.

Version: 1.2.6

crYOLO PhosaurausNet's eponym

#### For cryo images (low-pass filtered)

Number of datasets: 38 real, 10 simulated, 10 particle free datasets on various grids with contamination

#### For cryo images (neural network denoised with JANNI)

Number of datasets: 38 real, 10 simulated, 10 particle free datasets on various grids with contamination

The performance of the general model based on JANNI denoised data compared to low-pass filtered data did not improve. The average AUC on the validation data was in both cases the same (0.85). But this might be because of the data selected for the general model. I assume that especially on very noisy micrographs JANNI will improve the results.

#### For negative stain images

Number of datasets: 10 real datasets

Previous versions of crYOLO, the boxmanager and the general models can be found here: Archive.

# Known issues

• Issue 0: Training on multiple GPUs sometimes lead to worse performance (higher loss). We currently recommend to train on single gpus.
• Issue 17: On the fly filtering (--otf) is slower than using it not, as the filtering is not parallelized in this case.

Closed issues

Closed issues

• Issue 1: crYOLO sometimes not exit properly after training finished. Has to be killed manually.
• Issue 2: If you use automatic filtering with .tif files, you get an error like “OSError: cannot identify image file 'filtered_folder/another_folder/my_image.tif'”. It will be fixed in the next release.
• Issue 3: (Boxmanager) The visualization only shows the first filament when loading eman1 helical box files (start end coordinates). Will be fixed in the next release.
• Issue 4: The filament mode will crash if crYOLO cannot identify a single particle in the image. Will be fixed in 1.2.2
• Issue 5: If movies were aligned with cisTEM and picked with crYOLO, the box position are vertically flipped. Will be fixed in 1.2.2
• Issue 6: crYOLO does overwrite the environmental variable “CUDA_VISIBLE_DEVICES” with 0 if no gpu is specified by the -g parameter. This leads to the behavior that crYOLO ignores previous settings in CUDA_VISIBLE_DEVICES. Will be fixed in 1.2.2
• Issue 7: On K3 images crYOLO seems to add a offset toward the longer axis of the input image.
• Issue 8: There is a logical error in filament tracing, which sometimes connects two parallel filaments.
• Issue 9: Some people report an error when running cryolo prediction/training: “ImportError: numpy.core.multiarray failed to import”. It will be fixed in 1.2.3.
• Issue 10: On machines with many cores (e.g 64) an error during filtering might pop up: “[ERROR:0] 53: Can't spawn new thread”
• Issue 11: If the -g parameter is not provided, crYOLO will use the memory of all GPUs. Will be fixed in 1.2.3.
• Issue 12: The LineEnhancer depdenceny of crYOLO is still dependent from opencv. Workaround: In the crYOLO environment: conda install opencv
• Issue 13: After picking it can happen that some of the boxes are not fully immersed in the image. Will be fixed in 1.2.4.
• Issue 14: Parallelization in filament mode is broken. Will be fixed in 1.2.4.
• Issue 15: If the --gpu_fraction is used, crYOLO always uses GPU 0. Will be fixed in 1.3.1.
• Issue 16: --gpu_fraction only works for prediction, not for training. Will be fixed in 1.3.2.
• Issue 18: Prediction is broken in 1.3.2. It removes all particles as it claim they are not fully immersed in the image.
• Issue 19: Filtering does not work if target image directory is absolute path.
• Issue 20: crYOLO 1.3.4 has a normalization bug. During training the images are normalized seperately, but during prediction is done batch wise. Workaround: Use -pbs 1 during prediction. It will be fixed in 1.3.5.
• Issue 21: The search range for filament tracing is too low for many datasets. To check if you are affected: Use your trained model and pick without the filament options. Check if your filaments a nicely picked (many consecutive boxes on a filament). In the next version, the search range will be increased and added as an optional parameter.
• Issue 22: If absolute paths are used in the field “train_image” in your configuration file, filtering is skipped.
• <del>Issue 23: Since crYOLO 1.4.0 it sometimes take long until it starts picking. The reason seems to be the tensorflow update.<del>
• <del>Issue 24: Fine-tune mode does not start (cannot find layer model_3). Will be fixed in 1.4.1.<del>

# Installation

System requirements:

crYOLO was tested on Ubuntu 16.04.4 LTS and Ubuntu 18.04 with an NVIDIA Geforce 1080 / Geforce 1080Ti.

However, it should run on Windows as well.

As the GPU accelerated version of tensorflow does not support MacOS, crYOLO does not support it either.

crYOLO depends on CUDA Toolkit 9.0 and the cuDNN 7.1.2 library. It will be automatically installed during crYOLO installation.

Install crYOLO!

The following instructions assume that pip and anaconda or miniconda are available. In case you have a old cryolo environment installed, first remove the old one with:

conda env remove --name cryolo

After that, create a new virtual environment:

conda create -n cryolo -c anaconda python=3.6 pyqt=5 cudnn=7.1.2 numpy==1.14.5 cython wxPython==4.0.4

Activate the environment:

source activate cryolo

In case you run crYOLO on a GPU run:

pip install 'cryolo[gpu]'

But if you want to run crYOLO on a CPU run:

pip install 'cryolo[cpu]'
During the installtion of crYOLO you will see the following error message: ERROR: imagecodecs-lite 2019.2.22 has requirement numpy>=1.15.4, but you'll have numpy 1.14.5 which is incompatible. However, you can ignore it. It is actually also working with numpy==1.14.5

That's it!

You might want to check if everything is running as expected. Here is a reference example:

There is also a way to run crYOLO on CPU. To use it, just follow the instruction in the install section. This is especially useful when you would like to apply the generalized model and don't have a NVIDIA GPU.

Picking with crYOLO is also quite fast on the CPU. On my local machine (Intel i9) it takes roughly 1 second per micrograph and on our low-performance notebooks (Intel i3) 4 seconds.

Training crYOLO is much more computational expensive. Training a model with 14 micrographs from scratch on my local machine take 34 minutes per epoch on the CPU. Given that you often need 25 epochs until convergence it is a task to do overnight (~ 12 hours). However, you might want to try refining the general model, which takes 12 minutes per epoch (~ 5 hours).

# Start picking!

Use the step-by-step tutorial to get started!

# Change log

crYOLO 1.5.0:

• New fast low-pass filtering pipeline: Speed up on-the-fly filtering by 175%, speed up training by 75% speed up picking by 50%
• Add rotation as data augmentation
• Number of layers for fine tuning are now changeable (-lft)
• cryolo_evaluation.py will now output a html file with the results.
• Set patch argument as deprecated
• Remove warmup as config file option. Please specify it with -w.

crYOLO 1.4.1:

• Downgrade the dependencies to tensorflow 1.10.1 and numpy 1.14.5 as some users reported long initialization times. (Thanks to Shaun Rawson)
• The initialization weights are not longer shipped with the package and downloaded on-the-fly (because they are big and pypi does not allow such big packages)
• crYOLO is installed through pypi
• crYOLO box manager is installed through pypi and automatically shipped with the crYOLO package
• Fixed fine-tune mode (Thanks to Antoine Koehl)
• Fixed normalization function for YOLO backend (Thanks to Wolfgang Lugmayr)

Old crYOLO change logs

Old crYOLO change logs

crYOLO 1.4.0:

• Support Just Another Noise 2 Noise Implemnentation (JANNI)
• Update tensorflow from 1.10.1 to 1.12.3 to make crYOLO compatible with JANNI
• Update numpy from 1.14.5 to 1.15.4 to make crYOLO compatible with JANNI

crYOLO 1.3.6:

• Changed filament search radius factor from 0.8 to 1.41 (this fixed issue 21)
• Improved error message in case of corrupted config file
• Fixed issue 22: If absolute paths are used in the field “train_image” in your configuration file, filtering is skipped.

crYOLO 1.3.5:

• Fixed issue 20: During training the images are normalized separately, but during prediction is done batch wise. The lead to confusing results: some micrographs were perfectly picked, some totally unreasonable, even with the same defocus. This bug only affects the picking, already trained models can still be used.
• Remove unnecessary dependencies

crYOLO 1.3.4:

• Support for SPHIRE 1.2
• Changed the minimum threshold for cbox files from 0.01 to 0.1. Much faster in many cases but still low enough. If -t is lower than 0.1, the new threshold is used as minimum.
• Installation now checks if python 3 is used.
• Fix issue 19: Filtering does not work if target image directory is absolute path.
• Fix crash when --otf was specified but filtering was not specified in the config file

crYOLO 1.3.3:

• Fix issue 18: Prediction is broken in 1.3.2. It removes all particles as it claim they are not fully immersed in the image.

crYOLO 1.3.2:

• Speedup prediction: Vectorized some parts of the code and optimized the creation of the cbox files. 30% speed up picking / 15% faster training compared to 1.3.1/1.3.0.
• Bug fix in merging of filaments that sometimes throw “IndexError: list index out of range”. (Thanks to Alexander Belyy)
• Fix in cryolo_evaluation: If the validation data is specified with -b instead of runfiles, all datasets with only one box file were ignored.
• Change library requirement to PILLOW version 6.0.0
• Fix issue 16: --gpu_fraction only works for prediction, not for training.

crYOLO 1.3.1:

• Fix Issue 15: -g was ignored when –gpu_fraction was used.

crYOLO 1.3.0:

• Fine tune the general network to your data using the new fine tune option with --fine_tune (https://1n.pm/x8rUH)
• One-the-fly micrograph filtering during particle picking with --otf (don't double your dataset during picking)(https://1n.pm/goXAa)
• Interactive threshold adjustment after prediction using the new cbox-files and the crYOLO boxmanager 1.2 (https://1n.pm/k7HoI)
• Pick only fully immersed particles (Issue 13)
• Improved filament mode
• Rewrote tracing
• Rewrote and speed up merging of filaments
• Fixed parallelisation of the filament mode (Issue 14)
• Add tifffile as dependency, as imageio throws a lot of warning for some tif files.
• Add conversion for uint16 images, as pillow cannot work with them.
• Add option --skip_augmentation to deactivate augmentation during training (Thanks to Tijmen de Wolf). (https://1n.pm/goXAa)
• Add option --num_cpu to specify the number of CPUs used during training and during prediction. (Thanks to Nikolaus Dietz) (https://1n.pm/goXAa)
• Add option to limit the amount of GPU memory reserved by crYOLO with --gpu_fraction (Thanks to Nikolaus Dietz) (https://1n.pm/goXAa)
• Save anchor size in model every time you write a new model during training (not only at the end)
• In case of using --min_distance, only the particle with lower confidence is removed (Thanks to Yilai Li)

crYOLO Version 1.2.3:

• crYOLO now saves the anchors which were used during training inside the .h5 file and takes care that the correct anchors are used during prediction.
• LineEnhancer dependency is now installed via PyPi, as –follow-dependency-links is removed in pip 19.
• Fix Issue 9: Removed zignor dependency as it leads to problems for some users (Thanks to Jason Kaelber)
• Attempt to fix Issue 10: Removed opencv dependency which was connected to this problem (Thanks to Shaun Rawson)
• Fix issue 11: crYOLO uses now GPU 0 by default if not specified otherwise (e.g. by CUDA_VISIBLE_DEVICES)

crYOLO Version 1.2.2:

• Added the PhosaurusNet to the crYOLO backend, which makes the patch mode needless for picking single particles.
• crYOLO now outputs separate folders for EMAN box files and STAR files.
• When picking filaments it will now additionally output EMAN Start-End and STAR Start-End coordinates (Thanks to Jesse M. Hansen).
• Fix Issue 4: The filament mode will crash if crYOLO cannot identify a single particle in the image.
• Fix Issue 5: If movies were aligned with cisTEM and picked with crYOLO, the box positions were vertically flipped. (Thanks to Wei-Chun Kao)
• Fix Issue 6: crYOLO overwrote the CUDA_VISIBLE_DEVICES variable if the -g parameter is not passed. (Thanks to Shaun Rawson)
• Fix Issue 7: crYOLO introduces a shift for non square images proportional to the aspect ratio. (Thanks to Shaun Rawson)
• Fix Issue 8: crYOLO sometimes connects two parallel filaments. The filament tracing was optimized and seems now working properly.
• Fix a severe bug in filament tracing. Curved filaments splitted by crYOLO in more straight sub pieces. However, during the division, one half of the splitted filament was lost. (Thanks to Sabrina Pospich)
• Added a wiki entry about the networks which are supported by crYOLO

crYOLO Version 1.2.1:

• Fix Issue 2: Tiff files are now written as 32 bit when internal filtering is used.
• cryolo_evaluation now additionally estimates the optimal threshold based on the F2 score, which puts more weight on recall than on precision
• File ending of filament box files is now .box instead of .txt (Thanks to Jesse M. Hansen)

crYOLO Version 1.2.0:

• Switch to Python3 (Please use a fresh environment!)
• (Hopefully) fixed that crYOLO sometimes freezes during/after training (hard to reproduce, so I'm not 100% sure if it is fixed.)
• Fix that training with multiple GPUs did not speed up small datasets
• Low-pass filtering is now integrated into crYOLO
• Fix two bugs in cryolo_evaluation that lead to an underestimation the performance parameters
• cryolo_evaluation is now multithreaded if your training data is organised in subfolders
• cryolo_evaluation now contains a better method for optimal picking threshold estimation
• Refactoring
• Minor bug fixes

crYOLO Version 1.1.4:

• Hot fix for filament mode when applied to non square images.

crYOLO Version 1.1.3:

• Improved non-maximum-suppression brings 60% speedup during picking!
• Multi GPU support for training and prediction (e.g by adding -g 0 1 for GPU 0 and GPU 1 to the training/prediction command)
• Bug fixed which leads to a crash if no particles are picked on the first micrograph (Thanks to Björn Klink).

crYOLO Version 1.1.2:

• STAR files could now used for training. However, as they don't contain size information the size specified in the anchors in the config.json is used.
• Slightly improved speed of the filament-mode
• Fixed another bug running filament mode on non-square images (Thanks to Gregory Alushin)

crYOLO Version 1.1.1:

• More efficient MRC reading and batch prediction leads to ~50% faster training and ~70% faster picking when crYOLO is used in patch-mode (compared with the patch-mode in 1.1.0).
• 6x faster filament picking
• Reading of annotation data is now super-fast (Box filename has to be contained into image filename)
• Optimized filament picking parameters
• Fixed bug which made training fail for some 16 bit images
• Fixed bug which could lead to double picked filaments
• Fixed bug running filament mode on non-square images (Thanks to Gregory Alushin)
• Supports EMAN1 helix coordinates
• Support for star file format. During prediction, both box and star files are written.

crYOLO Version 1.1.0:

• crYOLO now supports filaments
• New evaluation tool
• Supports empty box files for training on particle-free images
• Extended data augmentation: Horizontal flip and flip along both axes
• Experimental support of periodic restarts during training (with –warm_restarts)

crYOLO Version 1.0.4:

• Fix a problem reading backend weights from read-only filesystem (Thanks to Michael Cianfrocco and Jason Key)
• Make sure that tensorflow version is >= 1.5.0 and < 1.9.0
• Add support for subfolders in training and validation directories
• More clear error message when the trained model does not fit to the architecture specified in the config file.

crYOLO Version 1.0.3:

• Ignore non-image files during training and predction (Thanks to Kellie Woll)
• Fixed misleading error when non existing folder is used as input for prediction (Thanks to Kellie Woll)
• Add distance threshold during prediction by adding -d distanceInPixel parameter to prediction command (Thanks to Lifei Fu)
• Add “–write_empty” parameter to prediction command if an empty box file should be written if no particle is picked.

crYOLO Version 1.0.2:

• Fix problem when mrc image has dimensions (1,width,height) (Thanks to Reza Behrouzi)

crYOLO Version 1.0.1:

• Normalization technique is now the same for 8-bit and 32 bit images.
• Unify image augmentation

crYOLO Boxmanager Version 1.2.6:

• Make it compatible with current new environment

Old crYOLO Boxmanager change logs

Old crYOLO Boxmanager change logs

crYOLO Boxmanager Version 1.2.3:

• Make it compatible with current new environment

crYOLO Boxmanager Version 1.2.2:

• Makes sure that the correct version of MatplotLib is used.

crYOLO Boxmanager Version 1.2.1:

• Press “h” for hiding the boxes
• Fix for loading different box sets with different colors for the case that on of the box sets are cbox files.

crYOLO Boxmanager Version 1.2:

• Add interactive threshold selection using cbox files

crYOLO Boxmanager Version 1.1.1:

• Fix Issue 3
• Now supports STAR Start-End filament format

crYOLO Boxmanager Version 1.1.0:

• Switch to Python3
• Minor bug fixes

crYOLO Boxmanager Version 1.0.4:

• Support of visualization of EMAN1 filament coordinates
• Make compatible with crYOLO 1.1.3

crYOLO Boxmanager Version 1.0.3:

• Support of visualization of EMAN2 helical coordinates (particle coordinates)
• New boxes could be loaded with a new color while keeping the old.
• Several bug fixes

crYOLO Boxmanager Version 1.0.2:

• Fix problem with invisible (start with .) files. Now they are ignored.

crYOLO Boxmanager Version 1.0.1:

• Fix crash when cancel import boxfiles
• Fix crash with qt4

Version 20190516::

• Added four more inhouse datasets
• Added SNRNP (Thanks to Clement Charenton)

Old General PhosaurusNet model change logs

Old General PhosaurusNet model change logs

Version 20190315::

Version 20190218:

• Added K3 apoferritin (Thanks to Shaun Rawson)
• Added two more inhouse datasets

Version 20181221:

• Same datasets as the general YOLO network model version 20181120 but with trained with PhosaurusNet.

Old general YOLO network model in patch mode

Old general YOLO network model in patch mode

Version 20181120:

Added multiple simulated datasets, where each micrograph contains hundreds of particles with different defocus:

• PDB 1SA0
• PDB 5LNK
• PDB 5XNL
• PDB 6B7N
• PDB 6BHU
• PDB 6DMR
• PDB 6DS5
• PDB 6GDG
• PDB 6H3N
• PDB 6MPU

Besides these simulated datasets we added handpicked

• ATP Synthase
• DNA Origami
• Two more particle-free only-contamination datasets.

It total 45 datasets are now included.

Version 20180823:

Increase the number of hand picked datasets to 25 by adding:

• Add EMPIAR 10154 (Thanks to Daniel Prumbaum)
• Add EMPIAR 10186 (Thanks to Sebastian Tacke)
• Add EMPIAR 10097 Hemagglutinin (Thanks to Birte Siebolds)
• Add EMPIAR 10081 HCN1 (Thanks to Pascel Lill)
• Add internal dataset (Thanks to Daniel Roderer)
• Furthermore we added 8 datasets of protein-free grids (Thanks to Tobias Raisch and Daniel Prumbaum)

Version 20180720:

Added micrographs of 7 new handpicked datasets:

• EMPIAR 10181 (Thanks to Dennis Quentin)
• EMPIAR 10017
• EMPIAR 10028 (Thanks to Oleg Sitsel)
• User contributed dataset (Thanks to Lifei Fu)
• EMPIAR 10089
• EMPIAR 10004 (Thanks to Daniel Roderer)
• EMPIAR 10072 (Thanks to Tobias Raisch)

Furthermore I had to remove one internal dataset, as it turned out that it is unsuitable for training the general model.

Version 20180704: