Object detection is a computer vision technology that can detect objects in images and videos. It answers the question: which object is presented and where is it?
Object detection algorithms can detect human, animals, trees, houses, cars, etc. Its real-life applications includes video surveillance, video analytic, contactless payment, animal detection, airport facial recognition, and autonomous driving, to name a few.
Figure 1 presents an example of object detection where the task is to detect animals (here are a cat and a dog) in a given image.
Fig. 1. Object detection (image source)
Algorithms for object detection are often referred to as object detectors. Object detector takes images or videos as input and output the list of objects detected. For each object, object detector often returns the category of the object (known as label) and the bounding box to locate the object in the image or video.
Challenges in object detection
There are several challenges that one may encounter when working with object detection.
Imagine you are building a face detector and at some point you get an input image like Figure 2. It is known as crowed scenario, where object detectors often fail when dealing with images that contain too many objects with different viewpoint, inconsistent scale and occlusion.
Fig. 2. Crowed scenario (image source)
Another challenge of object detection is intra-class variance, where objects of the same category might look very different. Figure 3 show one example where the six images of dogs looks very different in term of size, shape, colour, fur length, etc.
Fig. 3. Intra-class variance (image source)
Other challenges include foreground-background class imbalance where only a few pixels of the image contain useful information, object deformation, bad light condition. In some applications such as autonomous driving or video surveillance, it is essential for object detector to run in real-time.