In the first preprint of crYOLO we wrote the following sentence without any comments:
“Ideally, each micrograph should be picked to completion.”
However, you don't have to pick all particles in a micrograph to train crYOLO. Here I want to show how crYOLO performs with only sparsely labeled micrographs.
I took our toxin dataset, which I've used to train crYOLO before. The training set comprises 14 images with 1586 particles (~113 particles per micrograph). An example is shown here:
I then removed randomly 80% of the particles (above) and used it for training (default settings as in the tutorial). The training set now consists of only 314 particles:
I now use the trained model to pick the whole dataset. The picking with the default threshold is quite dissatisfying, as it only picks ~65 particles per micrograph:
However, if one uses the cbox files and the box manager, you can easily choose a different confidence threshold. With a threshold of 0.14 for example, you practically get all of your particles while at the same time excluding contamination:
BTW: The recall reported during training in such cases will be misleading, as it is calculated based on the default threshold of 0.3.
I did the same with ATP synthase. The orginal training set had 1723 particle from 91 micrographs. The sparsely labeled training dataset used the same training images but only with 334 particles labeled from 91 micrographs. Here are examples:
Now the comparision between picking with default and picking with an adjusted threshold:
And another example:
Again, it still picks basically everything while avoiding contamination.
The last example that I've choosen is TRPC4, as it contains much more contamination. The original training set comprises 32 images with 3038 particles (~94 particles / image):
Again, the same procedure as with toxin and ATP synthase. I removed 80% of particles randomly:
I trained the model, and picked again. Here are the results for picking with the default threshold:
It missed a lot, but picked far more that one would expect from the sparsely labeled training data. The missing particles appearing when you reduce the threshold to 0.14:
Particles picked, contamination skipped, mission accomplished