24/7 writing help on your phone
Save to my list
Remove from my list
Counting people in a meeting room, a room, or a lecture hall is an important and necessary problem for checking attendance, starting a meeting, a presentation, or a daily class lecture. In this paper, we present a study of a real-time program which counts the number of people in such an environment. The goal of the program is to help approximate the number of people present in the room or hall to cater for the related works. The input is video data collected by color camera, then extracting the Histogram of Oriented Gradient (HOG) features on each frame and using machine learning to classify, by Support vector machine (SVM) method.
This method has simple programming characteristics, provides relatively high accuracy, identifies people in many different face angles and is not much affected by the background. Training dataset contains 190165 images, including 7904 human face images and 182261 non-human images. The result we achieved has the accuracy of …, and the average processing time is ....
Counting the number of people in the room at a classroom or a conference is a very popular and common issue.
Knowing relatively the number of people present helps the hosts make important decisions such as starting or canceling an event, a meeting. Nowadays, taking advantage of computers and video recording devices to handle this work instead of human physical work is a requirement attached to the digital technology era. Thanks to the use of computer visions and image processing, we can estimate quickly, relatively accurately the number of people present in the room, with not much time and effort.
There have been many studies about counting people all around the world.
Paul Viola and Michael Jones with their Haar-cascade classifier [1]. Their prototype works on Haar-like features which were developed from the idea of using Haar wavelets. They could detect faces with about 90%. Di Wu detected and counted pedestrians through surveillance cameras with the accuracy of 89.7% [2]. He tried to identify the heads and shoulders by extracting the HOG features of foreground and classifying by SVM algorithm.
Bio-Computing Research Center's fellowship team included Bin Li, JianZhang, ZhengZhang and Yong Xu made a program to count people who pass under an over-head camera with accuracy of 96% [3]. They use LBP features based AdaBoost classifier for human’s head detection. Charoenpong and Sanitthai presented a detecting method used head contour, it fits the head into the contour by using the eclipse model [4]. The average accuracy is 95.39%.
The following parts of this paper are the proposed methods for detecting and counting people (section II), our work (section III), algorithm and program (section IV), experimental results (section V), and conclusion (section VI).
Local Binary Pattern (LBP) + AdaBoost
AdaBoost, or Adaptive Boosting, is a machine learning meta-algorithm. It helps to boost the classifying results of the others classifier. In this method, the output of LBP (“weak learner”) is combined into a weighted sum to create a much more precise-to-be-usable output.
Haar-cascade
Haar Cascade is a face detection method proposed by Paul Viola and Michael Jones. This method tries to calculate the Haar-like features in an area by applying Summed Area Tables and decide whether the area contains a face or not. However, in most of the cases, the area does not have the characteristics of a face so after that, AdaBoost algorithm will be used to choose the most suitable Haar-like features. And to reduce the processing time, Cascade of Classifiers is applied.
Histogram of Oriented Gradients + Support Vector Machine.
Histogram of Oriented Gradients (HOG) is one of the best methods to extract the features from image. This method divides an image into a grid of cells then calculate the direction and magnitude of the gradient vector in every cell. Then the algorithm constructs a histogram of gradients for each cell, then expand it into the histogram of gradients for the whole image. Usually, one image has a histogram with a large number of bins, this efficiently differentiates one object with another. After extracting the HOG features, the Support Vector Machine (SVM) will be applied to classify the objects base on their features.
Deep learning: R-CNN, Fast R-CNN, Faster R-CNN, YOLO.
Deep learning is one of the machine learning methods. It allows us to train an AI for predicting the output from a group of inputs. AI “brain” has a system of artificial neural network, the network arranges in a multi-layer order. Those multiple layers help the system be able to extract high levels of features from the inputs. Thus, deep learning provides the highest accurate result among all of the object detection methods. However, training AI requires a relatively large set of data and needs lots of time to finish.Figure 2: Histogram of Gradients
Shortly, we chose HOG and SVM method in our project for people detection and counting because based on previous works [6], HOG features have pretty high qualities when compared with other feature descriptors. They are calculated from a high density of grids in the picture and are normalized to minimalize the effect of the image brightness. Thus, this makes the result independent of environment conditions, especially the luminosity. In addition, SVM is perfectly suitable to classify the HOG features because with it can effectively find a hyperplane to separate the positive and negative data. This makes the machine learning model provide really good result in detecting whether one object is person or not. When combining these two, HOG and SVM, we can build a program which can perform pretty fast with high accuracy.
Histogram of Oriented Gradients
Histogram of Oriented Gradients (HOG) is a feature descriptor. In the HOG feature descriptor, the distribution of directions and magnitudes of the image’s gradients are used as features. Although we can use the input images at any desired size, we will talk about the one mentioned in the original paper of HOG [5]. In the paper, the input image is of size 64*128*3 and the output feature vector is of length 3780. To extract HOG feature from an image, we can get through five steps:Figure 3: Block illustration
A.1. Scaling:
According to the original paper, the image is in the ratio of 1:2. The ideal one presented is 64*128.
A.2. Pixel gradient computation:
Calculate the vertical component and the horizontal component of the gradient using the following kernels: [5].
Magnitude and direction of gradient: and
The magnitude of the gradient has its direction toward any area where there is a sharp change in intensity.
A.3. Histogram of Gradients in cells computation:
Divide the image into 8*8 cells. Then compute histogram of gradients for each cell.
An 8×8 cell contains 8x8x3 = 192 pixel-values. The gradient of this cell contains 2 values (magnitude and direction) per pixel which adds up to 8x8x2 = 128 numbers. These 128 numbers later turn into a 9-bin histogram which can be stored as an array of 9 numbers. This makes the representation not sensitive to noise.
The histogram is a vector which has 9 bins corresponding to angles 0, 20, 40, 60, 80, 100, 120, 140, 160. A bin is voted by every pixel, and the weight of the vote, or the value that goes into the bin, is the magnitude of the gradient of that pixel. If a pixel has its gradient direction between any two among nine angles in the histogram, the magnitude value will be distributed into these two angles. Figure 1: Grid cells
A.4. Block Normalization:
In this paper, a block is made by four neighboring cells. The aim of providing block definition because we want to normalize our histogram of gradients.
Gradients of an image are sensitive to overall lighting. This leads to a problem that the histogram will change if the image brightness varies. Ideally, our descriptor has to be independent of lighting variations, then normalization must be done.
With a 3-dimension vector the normalization step is to divide each element of the vector by its length, then the normalized vector is.
We can similarly do this with the 9-bin histogram, but the better way is to normalize a bigger sized block of 16*16, which is a 36-bin vector. The block normalization processes by using a “window”, which has the size of a block, moving 8 pixels after completing normalize each block.
A.5. Image’s HOG feature vector calculating:
The remaining task is to concatenate all of the 36-bin block vectors into one giant vector of the image. There are 7 horizontal and 15 vertical positions making a total of positions. Thus, the giant vector has dimensions.
Unit Support Vector Machine (SVM):
B.1. Distance from a point to a hyperplane:
Distance from a point (vector) with the coordinate x0 to a hyperplane: [7]. With is the equation of the hyperplane, ( is the number of plane bins) and is the weight vector of the hyperplane.
B.2. Problem of optimization of SVM:
Suppose the form training set points is with is a D-bin vector, represents the input of one data point, is the label of that data point. In this paper, we only present SVM for the case corresponding to the positive and negative data set. What to do is finding a maximum margin hyperplane: it divides these for which from those for which . The distance between the hyperplane and the nearest xi is maximized. We want to find two components and of this hyperplane [8].
With any , distance from it to the plane is: then [8]
Find : We need to find for maximizing the margin. This is a very complicated problem, so it must be brought about a simpler one.
Note: If we use and ( instead of and then the division surface does not change, or the margin is constant. Then we can suppose: for points closest to the division. Then with all we have . We can simplify the problem into solving this one:
However, even though it has been simplified, if the number of D dimensions of the large data space and the number of N data points increases, the solution (1) also becomes very complicated. So, we often solve the dual problem of this problem.
B.3. Duality problem for SVM:
B.3.1. Lagrange dual function:
Lagrange dual function is defined: . To find we solve the partial differential equation system of of and [10]:
Combine into we have :
Suppose that we have a matrix: and vector
Give , we have:
B.3.2. Lagrange duality problem:
We define the Lagrange duality problem as follows: with [9]. In this problem, the number of must-found parameters is N, or the number of bins of or the number of data points.
B.3.3. KKT conditions:
The problem’s equation will satisfy the KKT condition system [9]:
From we can find From find . From find .
Dataset
Dataset includes 7904 human face images with many different angles and 182261 non-human faces including objects that often appear in meeting rooms, halls, auditoriums, computer rooms ...
7904 faces are selected in Labeled Faces in the Wild dataset [11] and then processed by us, and 182261 non-human faces are selected by us and cut from about 2000 images from Indoor Scene Recognition dataset [12].
To suit the implementation of HOG and SVM calculations, these images are all processed to size 64x128 pixels.
Algorithm and program
Training SVMmodelFigure 4: Training flow chart
Notes:
Create a get_hog () function for extracting HOG features from image using the HOGDescriptor method in the OpenCV library.
When appending the features into an array, must assure this array is under the form of a 2-D array. We can easily do this by using function squeeze() in Numpy library. However, again, squeeze() is not able to do handle a huge array (we have an array with dimensions 190165*3780) so we had to use squeeze() in every (40000 to 50000)*3780 array then use stack() function in OpenCV library to merge those after-squeezed array.
Detecting algorithm
Non-maxima Suppression (NMS) is a function we wrote for merging multiple boxes representing the same object into only one box [13].Figure 6: Before NMS
Sliding window is a window-like algorithm for sliding through every area of an image [14]. We apply HOG feature descriptor on each area through which the sliding window runs for detecting human face (if available).
GUI building algorithm
The GUI we designed is based on a support tool, PyQt5. Then transfer the .ui file created to a .py file by using pyui5.exe application. Then we coded functions for the objects in the GUI. One important point when programming QUI is that we need the GUI to update the value continuously. So, we use Qtimer – a time pulse function – to show the frame continuously on the label (otherwise the program only gives us the first frame captured by camera onto the label).
There is a 'Process' button on the GUI interface. When pressing this button, the current frame will be captured and put into the detect() function for processing. The frame size can be 4:3 or 16:9 depending on the camera used. So, if the camera and label ratios dimensions are different, black areas will appear on either side or on the top and bottom edges of the image. These black areas create noise during processing, leads to bad results. Therefore, when using it, please make sure to select the GUI accordingly, so that the frame ratio taken by the camera is the same as that of the GUI label.
Experimental results
Our program has been developed in Python and test on an Intel Core i7 8750H 2.2GHz (6 cores/12 threads) laptop with 8GB RAM. The test set includes … images in which the numbers of human’s faces vary from 4 to 25. The results are presented in following table:
Experiment Number | Number of People | Proposed HOG + SVM Algorithm | Number of People Counted | Accuracy (%) |
---|---|---|---|---|
1 | 4 | 4 | 4 | 100 |
2 | 4 | 4 | 3 | 75 |
3 | 9 | 4 | 11 | 88.9 |
4 | 9 | 6 | 10 | 90 |
5 | 10 | 7 | 10 | 100 |
6 | 11 | 8 | 11 | 100 |
7 | 12 | 9 | 10 | 83.3 |
8 | 12 | 10 | 11 | 91.7 |
9 | 13 | 11 | 12 | 92.3 |
10 | 15 | 12 | 12 | 80 |
11 | 15 | 13 | 9 | 60 |
12 | 16 | 14 | 9 | 56.3 |
13 | 17 | 15 | 11 | 64.7 |
14 | 18 | 16 | 10 | 55.6 |
15 | 25 | 17 | 15 | 60 |
16 | 25 | 18 | 15 | 60 |
17 | 25 | 19 | 25 | 100 |
18 | 28 | 20 | 25 | 89.3 |
In this paper, we have presented a method for real-time counting people in a room, how it was built and how it performed. The accuracy we gained was not really high but it is acceptable. We think two of the reasons corresponding with this problem was that our test set is too small and the training set has some issues. Despite of this, our program is still practical because our beginning objective is to estimate the number of people in a room in order to make decision. In the future, our team will review the training images one by one carefully and we will experiment our after-fixed prototype on a larger set of test images for better results.
A Real-Time Method to Count People in a Room Using Histogram of Oriented Gradients and Support Vector Machine. (2024, Feb 13). Retrieved from https://studymoose.com/document/a-real-time-method-to-count-people-in-a-room-using-histogram-of-oriented-gradients-and-support-vector-machine
👋 Hi! I’m your smart assistant Amy!
Don’t know where to start? Type your requirements and I’ll connect you to an academic expert within 3 minutes.
get help with your assignment