Object Detection using OpenCV in Python


Introduction Object Detection using OpenCV in Python

Object Detection using OpenCV in Python, using Haar feature-based cascade classifiers is an effective method for object detection proposed by Paul Viola and Michel Jones. Both publish the paper “Rapid Object Detection using a Boosted Cascade of Simple Feature” in 2001. This method based on a machine learning approach where a cascade function is trained from a lot of positive and negative images. Then after that used to detect objects in the images.


  • We will understand the object detection using Haar cascade classifier.
  • Understand the basics of face detection and eye detection using the Haar Feature-based Cascade Classifiers.

For Object Detection using OpenCV in Python we are working on face detection. First, need a lot of positive images(i.e. Images of faces) and negative images(i.e. Images without faces) to train the classifier. Then extract the features from both images(i.e. Positive and Negative).  For this, Haar features used. They are just like a convolutional kernel network. Each and every feature is a single value obtained by subtracting the sum of pixels under the white rectangle from sum of pixels under the black rectangle.

Now, all possible sizes and locations of each kernel are used to calculate lots of features. (Just imagine how much computation it needs? Even a 24×24 window results over 160000 features). For each feature calculation, we need to find the sum of the pixels under white and black rectangles. To solve this, they introduced the integral image. 

Widget not in any sidebars

From all features whatever calculate from images, most of them are irrelevant. For example, consider the image below. The top row shows two good features. The first feature selected seems to focus on the property that the region of the eyes is often darker than the region of the nose and cheeks. The second feature selected relies on the property that the eyes are darker than the bridge of the nose. But the same windows applied to another part of faces like cheeks or any other place is irrelevant. So how do we select the best features out of 160000+ features? It is achieved by Adaboost Algorithm.

For this, we apply each and every feature on all the training images. For each feature, it finds the best threshold which will classify the faces to positive and negative. Obviously, there will be errors or misclassifications. We select the features with a minimum error rate, which means they are the features that most accurately classify the face and non-face images. The final classifier is a weighted sum of these weak classifiers. It is called weak because it alone can’t classify the image, but together with others forms a strong classifier. The paper says even 200 features provide detection with 95% accuracy. Their final setup had around 6000 features. (Imagine a reduction from 160000+ features to 6000 features. That is a big number).

So now you take an image. Take each 24×24 window. Apply 6000 features to it. Check if it is a face or not. When you calculate it is time-consuming and less inefficient.

In an image, most of the image is the non-face region. So it is a better idea to have a simple method to check if a window is not a face region. If it is not, discard it in a single shot, and don’t process it again. Instead, focus on regions where there can be a face. This way, we spend more time checking possible face regions.

For this method which frames as the face or not, this concept of Cascade of Classifiers. Instead of applying all 6000 features on a window, the features are grouped into different stages of classifiers and applied one-by-one. If a window fails the first stage, discard it. We don’t consider the remaining features on it. If it passes, apply the second stage of features and continue the process. The window which passes all stages is a face region. How is that plan!

The authors’ detector had 6000+ features with 38 stages with 1, 10, 25, 25, and 50 features in the first five stages. (The two features in the above image are actually obtained as the best two features from Adaboost). According to the authors, on average 10 features out of 6000+ are evaluated per sub-window.

Haar-Cascade Detection in OpenCV

OpenCV provides a training method or pretrained models, that can be read using the cv::CascadeClassifier::load method. 

The following code example will use pretrained Haar cascade models to detect faces and eyes in an image. First, a cv::CascadeClassifier is created and the necessary XML file is loaded using the cv::CascadeClassifier::load method. Afterwards, the detection is done using the cv::CascadeClassifier::detectMultiScale method, which returns boundary rectangles for the detected faces or eyes.

import numpy as np
import cv2 as cv
import matplotlib.pyplot as plt
%matplotlib inline

#load the classifiers downloaded
face_cascade = cv.CascadeClassifier('/home/shubham/Machine Learning/FireBlaze/OpenCV/face.xml')
#eye_cascade = cv.CascadeClassifier('haarcascade_eye.xml')
#read the image and convert to grayscale format
img = cv.imread('/home/shubham/Machine Learning/FireBlaze/OpenCV/Screenshot from 2020-03-30 09-42-51.png')
gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
#calculate coordinates
faces = face_cascade.detectMultiScale(gray, 1.1, 4)
for (x,y,w,h) in faces:
	roi_gray = gray[y:y+h, x:x+w]
	roi_color = img[y:y+h, x:x+w]
	eyes = eye_cascade.detectMultiScale(roi_gray)
	#draw bounding boxes around detected features
	for (ex,ey,ew,eh) in eyes:
#plot the image
#write image


Now, understand the detect the object using Haar-Cascade classifier in python. Here is used the .xml file to detect the object (i.e. pre-trained model).


Please enter your comment!
Please enter your name here