YOLO (You Only Look Once) is an algorithm that uses neural networks to offer real-time detection. Its popularity is due to being much faster than other methods while still providing a good accuracy. It is fast by design: it has fewer convolutional layers (9 instead 24) and fewer filters.
YOLO has been used in many applications such as autonomous driving, vehicle detection, intelligent video analytics, detecting traffic signs, parking meters, people and animals.
Fig. 1 The architecture
YOLO is a network that learns to detect simultaneously several objects, and it does a good job at learning general representations of those. It takes an image as an input and, as a single stage detector, predicts multiple bounding boxes and their corresponding class probability.
Previous approaches to detection indirectly use classifying techniques: to do object detection, they use a classifier for this object and look for it at different scales and locations in the test image. After classification, post-processing is applied to perform a bounding box refinement. These are difficult to optimize and slow architectures.
YOLO is extremely fast due to the fact that detection is modeled as a regression problem. It looks at the whole image, it is generalizable so it performs well on images that are different from the ones in the testing set.
One limitation with which YOLO struggles is to offer an exact location of small objects (for example groups of birds).
Author: Gabriela Ghimpeteanu, Coronis Computing.
Bibliography
[1]J Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. ArXiv preprint arXiv:1506.02640, 2015.