Closed Circuit Television CCTV

Categories: Artificial Intelligence Technology Television Time

Essay, Pages 11 (2701 words)

Views

140

Closed Circuit Television (CCTV) is a small device working as a video recorder with the purpose of security surveillance. CCTV is used in both big and small companies, houses, schools, including in small alleys. This is indeed a big help in security as we could immediately be aware of the intruder presence. However, CCTV also has its own downsides because it is a device that records the place we would like to monitor. No alerts or data are collected from the recorded videos.

Don't use plagiarized sources. Get your custom essay on

“ Closed Circuit Television CCTV ”

Get custom paper

NEW! smart matching with writer

In this research, one of the downsides of a CCTV was tackled. The faces recorded by the CCTV camera initially were identified manually, therefore, to optimize the usage of CCTV as a security camera, the research is done to get the information on motion, face detection, and face identification from processed CCTV videos. This will help in detecting strangers and alarm us for possible threats.

Flowchart for Motion Detection and Face Recognition for CCTV As stated in Figure 1, the first step after accumulating the video is motion detection.

The videos from CCTV are tested for motion availability by using Accumulative Difference Images (ADI). Tested images are compared with referent images and the results are collected and compared to a certain threshold. If the accumulation limit set by the threshold exceeds the threshold itself, the image has motion. If the motion is detected, the process will then proceed to face detection. Based on the results, this method has given the success rate of 92.655% with a time of detection of 1.

115 seconds (Nurhopipah & Harjoko, 2018). Face detection is not less important as both face detection and identification are usually linked with each other. The famous method for face detection is the Viola-Jones method because of its accurateness. However, in this research, the face detection process is done by using the Haar Cascade Classifier algorithm. Haar Cascade Classifier is a quick image processor and results in nearly accurate detection (Nurhopipah & Harjoko, 2018). The three key techniques of this method are integral image, AdaBoost based algorithm learning, and Cascade method for merging classifiers. The success rate for this method is 76% as 95 out of 125 detected faces contain correct information.

The problem identified with this method is that the image is low in resolution, the face position, blocked face, lighting problem, or blurred faces as the result of fast motion. Next is the process of face identification. Speeded-Up Robust Features (SURF) was used to detect and describe the feature of the faces that were detected by the camera. The image then will be compared with existing data to identify the person. This process is done by using the Principal Component Analysis (PCA) procedure. The procedure matches data with the result of face identification by SURF earlier. Based on the research result, the success rate of the identification is 60% out of 65 detected faces. The issues that affect the face detection includes small image resolution, lighting problem, illumination, and face position. Other than that, the research concludes that the time taken to detect and identify faces from the CCTV feed is 0.202 second while the ideal time to process the video is 0.1 seconds thus resulting in a time delay (Nurhopipah & Harjoko, 2018). The research done by Nurhopipah and Harjoko (2018) has collected its results and it shows that the process is reporting a time delay.

The success rate of face detection is declining compared to the other existing research as said in the paper and the process is taking a long time. Therefore, the 6 authors concluded their research needed more improvement. Face recognition was one of the currently trendy topics. This can be seen when Apple launched its first-ever iPhone X which came with face recognition functionality. Everyone was talking about their product, including teenagers as well as professionals. However, it is undeniable that face recognition is an amazing technology. This is because face recognition can be used in many sectors such as authentication. There is a lot of research papers regarding authentication face recognition. One of them is the Continuous face authentication scheme for mobile devices with tracking and liveness detection by Max Smith Creasey, Fatema A. Albalooshi, and Mttukrishnan Rajarajan.

The first part of the research was related to work and motivation. In order to complete this research, the researchers had to study the previous works. Based on the previous work, they found that face recognition had been used a long time ago and one of the uses was in the Nokia N90 mobile devices. The devices had used the Viola-Jones technique to detect the skin color and a high rate of recognition was achieved. As time goes by, there was a lot of advancement and one of them was mobile sensors and computational power. Through that method, a little yet important improvement in face recognition technology was made up. Next, they found that different fa 8 cial algorithm also contributed to face recognition advancement. However, through that method, the spooky attack can happen at any time as there were no liveness confirmation regarding the face detected. As the spooky attack being used to trespass the security, they found a way to detect the spooky attack. The technique was to collect all the spooky images and saved it into a dataset, then move it into a framework. With that, any spooky attack can be detected. They also discovered object tracking which can track faces. After all, the researchers conclude that most schemes focus on scheme design and lack suitable realism or attack mitigation (Max Smith-Creasey & Rajarajan, 2018). Also, they found that the unexplored area of face recognition was continuous facial authentication.

Besides, the second part of the research was the continuous facial authentication. According to the research, the general idea of continuous facial authentication was to observed a face in a video. By using the front camera of the mobile phone, the user's face was observed as soon as they unlock their phone and until they stop using the phone. The face later will be captured into frames and the frames will be detecting the face. Padding will be added in order to increase the number of face detected. Viola-Jone technique and OpenIMAJ were also used to process the image. As the face detected, a facial landmark was also detected and extracted to detect the liveness of the user. In order to achieve a landmark, the Constrained Local Model also known as CLM techniques were used. Through this, any spoof user will be banned from using the phone. A feature vector was also gained from the facial landmark that being separated into a featured image and processed using an image processing technique. From that process, a face was authenticated and the face will always be authenticated as long as the user using the phone. To track the face, a certain image was chosen and compared matched using a template, Once the image does not ft into the framework set up, a re-authentication will take place and using the same process all over again (Max Smith-Creasey & Rajarajan, 2018).

Experimental results were the third part of the research. In this part, a few experiments were tested in order to prove the concept. This process was divided into seven subparts. For the dataset part, a few videos from different common activities such as sitting, standing, and walking were recorded. Two different videos recorded to prove the spoofy image, one from a printed image on an A4 paper and one from a live video. Plus,a different type of devices was used. While for the preprocessing process, the face area of the image was turn into a grayscale image and undergo normalization process. In addition, the verification techniques, there were 3 different techniques used which resulting the similarity output. Based on the research, there was five common biometric evaluation metric used in order to test the effectiveness (Max Smith-Creasey & Rajarajan, 2018). Those five were False Acceptance Rate(FAR), False Rejection Rate(FRR), True Rejection Rate(TRR), True Acceptance Rate(TAR) and Equal Error Rate(EER). In order to detect the liveness, they used the dataset obtained from the dataset part as they already evaluate the liveness of the image while avoiding ing the spoofy image.

They also take a few main facial landmarks such as eyes, nose and mouth to be tested using eight test methods and held the calculation. From the test conducted, they came into a conclusion where a higher TRR and TAR and lower FRR and FAR give better results (Max Smith-Creasey & Rajarajan, 2018). They also recorded the average elapsed time using all the eight methods and found the meantime was 41.9 ms. Mouth region came with the fastest time as it has the highest accuracy. In another hand, facial recognition and scenario cross-comparison took place 10 where all the dataset from UMDAA and CALF were tested according to the purposes. From the image obtained from the video, the sub-image also extracted and processed to define its accuracy. Also, the pixel dimension of the image was modified according to its need. Last but not least was face tracking. A real-time test was held including all perspectives that could affect the test such as the addition of padding and frame size. The test came with a result which a re-authentication process right after the face lost its track and this could avoid any attack windows. The time to take to track each face frame also recorded and it was 4 frames per second which actually had minimized the computational time. In the term of time, the addition of padding into the image resulting in more processing time but better face tracking. Thus, an experiment was carried out in order to overcome the problem and they conclude that tracking a faceless frequently increases efficiency and improves the longevity of authentication (Max Smith-Creasey & Rajarajan, 2018).

From the research, the researchers concluded they have made a continuous face recognition scheme which included many functionalities such as face tracking and liveness detection of image. They also claimed that they did make better changes from the previous work they had investigated. As for future work, they planned to concentrate on improving the framework as the current one was having limitation. They also considering taking color and texture perspective to enhance the detection of the liveness of the image. Lastly, they would like to deal with matters such as partial face detection by learning the protocol (Max Smith-Creasey & Rajarajan, 2018). For the past years, face recognition is widely used for own security either used as devices, home or personal trust. Other than that, in term of find ing people - suspect or missing person, face recognition plays a big role in those cases. (Martinez, 2009) found that face recognition is a science on how biologically figuring individual faces which particular computer systems will emulate it. Many algorithms are used on face recognition. One of the methods is dictionary learning with a single resolution. Dictionary learning is part of sparse representation in many fields of image processing. The method can be defined as signal processing and machine learning which aims to find a frame that will admit as a sparse representation (Dictionary Learning: Theory and Algorithms, n.d.). Normally, conventional dictionary learning methods is used on face recognition which focusing on a single resolution. Distinct dictionaries are used in image classification and face recognition.

A novel robust, discriminative and comprehensive dictionary (RCDL) and nonlinear dictionary learning (NDL) are one of the few examples of other methods; the first one was used to improve the classification capability of the dictionary and the latter employed a feed-forward neural network to seek hierarchical feature projection matrices and a dictionary simultaneously. Finally, jointly trained dictionaries of low-resolution and high-resolution image patches get a more compact representation which at the same time reduced the computational cost substantially. Besides, deep learning is one of 13 the most representative works. For instance, a hybrid convolutional network (ConvNet) - Restricted Boltzmann Molarity (RBM) model concatenates the features of different face region pairs extracted by various deep ConvNets. But, to get the enough features, it needs a large-scale training set. In addition, it uses a certain amount of computing resources to keep running (Xiaoling Luo & Yang, 2019). Compared to conventional methods, the latter is more suitable with small scale databases and more efficient in computing. Since it is difficult to get a reliable robust dictionary due to the small sample size problem. Meanwhile, a joint dictionary only generates high-resolution face images rather than getting a dictionary for multi-resolution face recognition.

Therefore, this research article is proposing multi-resolution dictionary learning for face recognition to be used to get a robust dictionary. They enhance the robustness of face recognition by demonstrating with several multi-resolution face recognition datasets that were designed by the resolution pyramid method (Xiaoling Luo & Yang, 2019). There are a few algorithms that are related to the proposed method. The dictionary learning algorithms are separated into three kinds: supervised dictionary learning algorithms, semi-supervised dictionary learning algorithms and unsupervised dictionary learning algorithms. The problem with supervised dictionary learning algorithms is hard to obtained large-scaled sample data that indirectly limits the progression of the methods. Semi-supervised dictionary learning algorithms balance the dictionary size and stabilize the problem of the sensitivity to noisy and outlier samplers, but labeled samples. K-SVD that continuously updates the dictionary until satisfying the sparse condition and locality-constrained linear coding algorithm(LLC) that selected familiar basis of local image descriptors from sparse representation 14 which are few examples of unsupervised dictionary learning algorithms. Pro posed method is similar to joint dictionary learning, but with some differences.

Like unsupervised dictionary learning algorithms, it aims at reducing the re-composition error of multi-resolution training samples. At the same time, it utilizes more than two dictionaries to classify the multi-resolution images that solve the problem of multi-resolution face images recognition (Xiaoling Luo & Yang, 2019). According to the experiments that were conducted for the research, the proposed method is compared with some conventional methods for face recognition such as KSVD, D-KSVD, LC-KSVD1, LC-KSVD2, DLSPC(GC) and SRC in few databases. The setting for multi-resolution is the research constructs several multi-resolution dictionary learning datasets. For example, Extended Yale B face database (64 x 64 pixels) and divided the whole dataset into a training set and a test set in the 1:1 ratio then convert them to multi-resolution datasets respectively. Then the research reduces the resolution of all original training set into 32 x 32 pixels and 16 x 16 pixels respectively. So, the experiment would get three different resolutions.

The experiments used several public face recognition datasets like Extended Yale B face database, ORL face database, the AR database, the CMU PIE face database and the Labeled Faces in the Wild database(LFW). Most experimenting results are positive because the proposed method recognition rate is higher compared to other conventional methods. For instance, on the ORL face database, other methods like KSVD and LC-KSVD2 has lower recognition rates than the proposed method. The proposed method recognition rate is roughly 92.15% in 0.02 seconds. Meanwhile, the previous one is estimated at 89.15% and 88.85%. However, the time taken is shorter for the conventional methods since it focuses on a single resolution rather than multi-resolution images (Xiaoling Luo 15 & Yang, 2019). Other than comparing with conventional methods, the proposed method also compares with deep learning methods. Some models based Convolutional Neural Networks (CNNs) is applied like AlexNet, VGG and ResNet for comparison purpose. Since the standard database scale is small, deep learning method is trained in two ways - without pre-training and with pre-training. The result of the comparison is pre-trained ResNet18 exceeds the proposed method on the Extended Yale B, AR and LFW databases. In contrast, proposed method has better result on the ORL and PIE databases on recognition rate. Anyhow, the time process is unknown since the research only show the result of recognition rate (Xiaoling Luo & Yang, 2019). Usually, a face image is taken by various cameras to get different resolutions compared to proposed methods multi-resolution dictionary learning. Conventional methods always use same images resolution for training the dictionary learning. The effect is the datasets dictionary are not suitable for reality cases which have different resolutions. However, other than providing dictionary learning algorithms, the latter - proposed method use various dictionaries in the training phase, it also provides a strong restriction to keep the similarity (Xiaoling Luo & Yang, 2019).