Enhancing Pupil Detection in Eye Tracking Systems: A Computational Approach

Essay, Pages 11 (2739 words)

Views

Introduction

Eye tracking technologies, pivotal for applications ranging from human-computer interaction to medical diagnostics, face challenges in accurately detecting the pupil in diverse and occluded conditions. Our study introduces an improved algorithmic approach that combines enhanced image binarization, noise elimination, and contour extraction techniques to locate the pupil with remarkable accuracy.

On the basis of the relatively static head tracking system, we make improvements for OTSU algorithm to divide the binarized pupil image and the background image completely. A function is written to remove a small area that gets a complete and clear binarized pupil image for the existence of small area noise clump.

Don't use plagiarized sources. Get your custom paper on

“ Enhancing Pupil Detection in Eye Tracking Systems: A Computational Approach ”

Get high-quality paper

NEW! smart matching with writer

Even though a large body of work, detecting of eye particularly the pupil in frames recorded under real-world is challenging.[1] Finally, using the contour extraction method and the three-point circle method the task of Pupil location is possible. A strong pupil tracking algorithm on the basis of circular Hough transform is used.

This algorithm was tested on 37 images taken from CASIA Iris dataset.

Our method achieved significant accuracy compared with the previous state-of-the-art method—yielding 90.71% accuracy on 90% eyelid occlusion. We use an efficient algorithm to detect the pupils which are based on color intensity change to lessening the calculation load. Feature extraction method named Kalman ﬁlter is used for gaze movement process. Harr’s cascade classifier was used to first locate the eye’s area, and once found and support vector machine (SVM) for categorization with the trained datasets. The state of emotions facial landmarks of the salient patches on face image using automated learning-free facial landmark detection technique is taken as a second hand.

Methodology

Face detection is a technology that localizes the human face in digital images, regardless of the environment that surrounds the face or the various illumination conditions[2].In that Eye tracking is an important research area with wide range of applications, such as human-computer interaction, computer-aided diagnosis for psychological, neurological, or ophthalmological disorders, virtual reality and 3D graphics, as well as an assistive system for disabled people.

Recent developments in head-mounted eye tracking have enabled researchers to study human visual perception and attention allocation in natural environments.[1]In general, camera pixels that capture human eye images are low, while displays for interactive functions tend to have higher resolution. It may cause the calculated gaze point position deviation of dozens of pixels if the results of pupil positioning have a pixel deviation. Therefore, the accuracy of the pupil location is crucial to the overall performance of the system. Most methods are based on the detection of a circle which mainly considering the geometric characteristics of the pupil.. Initially, we Extract binarized pupil image by using improved Otsu algorithms.

Then, a function to remove a small area and fill the hole is written to eliminate the appearing small area spots in the extracted image. Finally we realize the precise pupil location using the method of contour extraction and three-point fixed circle. . In general, this algorithm works by tracing each pixel edges so the boundaries between object and background are detected.[5]. It is a very challenging task, due to the fact that faces in the captured image may be occluded by other objects or they may have different skin colors.[2] Tracking of the pupil is the process of estimating either the point of gaze (where the subject is looking) or the motion of the eye in relation to the head. It has become well known with the development of augmented reality (AR) and virtual reality (VR) in recent years. However, to adapt these two technologies, researchers are focusing on accurate eye tracking, which requires high-quality cameras, gyroscopes, and some other specialized devices. It causes the cost of eye-tracking technology to remain high.

Nowadays, there are more and more needs to analyze user behaviors from tracking eye attention in general applications, in which users usually use a consumer-grade computer or even laptop with an inexpensive webcam[4].There are mainly two ways of eyeball movement when human’s gaze is moving. One is smooth movement, during this movement, humans can pay attention to track the moving objects, and the maximum speed of smooth movement is 30 degrees/s.

The speed of saccade movement is very fast, which is about 400~600 degrees/s. At a maximum speed of 600 degrees/s, it will take 0.05 seconds for 30 degrees, but in this case, the visual system of human being needs time to get out of this kind of state of ignoring, the required time will be 0.17 seconds, then the user will pay enough attention on the target. Therefore, the total time taken is 0.22 seconds. In this situation, eye tracking system needs a speed which is higher than 4.5 frames/s to capture the gaze.

The AdaBoost-based cascaded classifier which is contained in OpenCV is used to get a high processing speed. Local binary pattern (LBP) feature operator processes a local 3 × 3 pixels region, by comparing the central pixel of the region with the surrounding 8 pixels. Set the value on the corresponding pixel to 1 if the previous value is greater than the center’s, otherwise set it to 0, and then the binary numbers on the compared 8-pixel points are arranged in sequence to form an 8-bit binary number.

The calculation load is lower, which makes LBP features-based classifier that has a low requirement for devices.

LBP feature-based classifier is light-weight lows.

The speed of detection using LBP features-based classifier is higher.

LBP(x_c,y_c )=∑_(p=0)^(p-1)(2^p s(i_p-i_c))

After eye detection, clip the part of eyes from the original image to decrease the calculation load in further processing.

A traditional pupil detection method based on geometry features is Hough transform, which detects the pupil by detecting a circle from the image of the eyes, it is an effective method for locating the pupil, but it is more suitable in a situation where the distance between the camera and eyes is short, in, the distance is only about 30 to 50 mm. we propose a method for locating the pupil by checking the color intensity change on the image of eyes. The specific processing steps are as follows.

Locating the eyes using the cascade classifier contained in OpenCV and clip the part of eyes from the original image to decrease the calculation load in further processing.Smoothing the image of the eyes using bilateral filter. It is necessary to smooth the image first for further processing which is checking the color intensity change on the image of eyes.

From the center point of the image of the eye, verify the color intensity horizontally, and locate the dark part on the horizontal line which is closest to the center, calculate the average of intensity on the dark line.

Calculate the center of the dark block, which is the approximate center point of the pupil.For the accuracy of pupil detection, it may depend on multiple factors, such as the light intensity, light angle, size of the eyes and extent of the pupil area exposed.

EYE-GAZE tracking consists of estimating the direction of gaze of a person and it has been an active research topic for many decades because of its potential usages in many applications, such as human–computer interaction, virtual reality, eye disease diagnosis, human behavior studies, etc.[3].There are two kinds of eye movements when a human being moves his or her gaze. First, when the angle change is small, only the eyeballs will move and the head will remain motionless. The second is that the eyeballs and the head will move together when the angle change is large. Initialization : we propose a method of rough estimation for the users’ gaze. And also we assumed that the approximate horizontal movement of the pupil in the image was proportional to the distance change of the gaze. Capture the subsequent images:In a given frame after starting gaze estimation, the center point of the face is (xfp, yfp), and the midpoint of the line connecting the two pupils’ center points is (xep, yep).

Interactive applications use eye-gaze data to respond or interact with the user based on the observed eye movements[3]. In the Horizontal direction, we define Tfh as a threshold of negligible horizontal face movement, if the face movement is not more than Tfh, set the horizontal face movement to 0.

After getting the results of horizontal direction and vertical direction, output the corresponding gaze position.

We did two simulations, one in which the attention area was divided equally into 9 cells (3 ×3) and the other one was divided into 25 cells (5 ×5).The program achieved a high accuracy in the 3 × 3 partitioning mode, but the accuracy rate decreased in the 5 × 5 partitioning mode. It is found in the experiment that in the 5 × 5 division mode.

This study uses a webcam as the main device for eye tracking and achieves a high accuracy of 94% on a screen divided into 9 sections and 78% for 25 sections.The proposed method is a fast eye tracking method which is suitable for general application in human computer interaction.

In the previous study , Satriya et al. introduces an improved ellipse fitting algorithm by implementing adaptive image binarization based on cumulative histogram and Random Sample Consensus (RANSAC) for outliers removal. RANSAC is used during fitting an ellipse to the extracted pupil contour. They implement the algorithm in static images and videos of an eye tracking system. Their method improves accuracy of the conventional ellipse fitting algorithm up to 51% on 90% occlusion of eyelid.

Unfortunately, the improved ellipse fitting algorithm does not preserve 80% accuracy of the estimated pupil coordinates when the pupil is covered 70% or more. In this case, the improved ellipse fitting algorithm is not appropriate for extreme pupil occlusion with 70% or more eyelid coverage. Furthermore, RANSAC is a non-deterministic algorithm that leads to unstable result in every tracked video frame.To overcome this research gap, we propose a robust pupil localization algorithm based on circular Hough transform.

In the grayscale image, intensity levels are divided into black, white, and some levels between black and white. To eliminate noises in the grayscale image, we use Gaussian blur as a filter. Gaussian filtering is done by convolving each point in the input array with a Gaussian kernel and then summing them to produce the output array. In 2D Euclidean space, the equation of a Gaussian filter is represented as canny edge detector is used to detect the edge of the pupil. The edge detector works in a single entity with the Hough transform. Canny edge detection can be formulated by hough circle transformation can be used to infer radius of circular shape. In our case, human’s pupil can be identified as circular shape. Unlike edge detectors, Hough transform is more tolerant to gaps in features that describe the borders and more robust to noise.

The formula of the Hough circle is

(x−a)2+(y−b)2=r2

We performed exactly similar procedures as conducted by Satriya et al. to verify our algorithm. We evaluated the proposed algorithm on static eye images from Chinese Academy of Sciences Institute of Automation (CASIA) dataset.

We used two-tailed Z-test with critical values of 1.96 and -1.96. That is, the difference between both algorithms is significant if the Z-value is less than -1.96 or more than 1.96. In this case, Z-test shows that the results differ significantly during 10% and 90% occlusion. We achieved significantly higher accuracy during 90% eyelid occlusion (M=90.71%) compared with improved ellipse fitting method (M=19.28%). RANSAC gener-ally poses unstable results, as result of one iterative computation will be different from the other iterations. Our choice of im- plementing Hough transform also reduced computational time needed to find the center of pupil.

The proposed method worked under several constraints—Hough transform parameter was adjusted empirically during the exper-iment (i.e., ranging from 25–65) while the proposed algorithm has not been tested yet in real-time video streaming. In future, we intend to improve our algorithm by incorporating adaptive mechanism for Hough transform parameters.

Previous work of pupil localization based on improved ellipse fitting algorithm has not been able to preserve 80% accuracy when the pupil is covered 70% or more. To solve this gap, we propose a new approach by incorporating Hough circle transform for pupil localization.

Eye movement data provide important information in various fields, such as humancomputer interaction (HCI), medical science, security system and interface design. In the HCI field, eye tracking can be used as objective measurement of usability as a complement to subjective measurement such as questionnaire[5]. The gray image information can be used to analyze the eye image and completing pupil location because of the large difference between the gray levels of different parts of the eye. So the content of my research is mainly divided into two aspects: the human eye and pupil center positioning.

The main goal of eye positioning is to detect the position of the eye from the input image or video stream. Many researchers have used various methods to complete gaze tracking system.2.Pupil location is based on the eye positioning including the image binarization, threshold selection, image morphology filtering operation through the gray image of the eye image to complete the precise positioning of the pupil. The use of contour extraction method and the three-point fixed circle method to determine the pupil coordinates.

Area of interest (ROI) is an image area that selected from the image. There are two ways to define ROI regions: One is to use Rect for representing a rectangular area. It defines a rectangle by specifying the rectangle's upper left corner coordinates and the length and width of the rectangle. Another way is to specify the range of rows or columns of interest to define ROI.

Searching pupil in all region of the image will waste time and resource given the pupil occupy only small portion of the image[5]. It is necessary to obtain the best binary image that can distinguish the pupil portion from other portions in order to split out the pupil portion. Therefore, the selection of the threshold is the key to the whole process. The Otsu algorithms is a commonly used automatic pupil threshold segmentation algorithm.

The basic idea of Otsu algorithms is: The final binarized segmentation threshold is the pixel value with the largest difference between the two categories can which be determined by counting the pixel values of each point in the whole image and dividing it into two categories. According to the histogram of the grayscale distribution of the image, the gray value of the non-occurring pixel is removed and the calculation range is redrawn.

Results

The application of the circular Hough transform algorithm on the CASIA Iris dataset yielded a significant accuracy improvement in pupil detection, particularly in images with high levels of eyelid occlusion. Our method demonstrated a 90.71% accuracy rate, surpassing previous state-of-the-art methods.

Efficiency of Algorithm

The algorithm's efficiency in detecting pupils based on color intensity changes notably reduced the computational load, offering a streamlined approach suitable for real-world applications.

Discussion

The integration of improved OTSU binarization, noise elimination, and advanced pupil location techniques presents a comprehensive solution to the challenges of pupil detection in eye tracking systems. The method's high accuracy and efficiency position it as a promising tool for enhancing eye tracking technologies across various applications.

Conclusion

This study contributes to the field of eye tracking by offering a novel algorithmic approach that significantly enhances pupil detection accuracy, especially in challenging conditions. Future work will aim to extend these findings to real-time video analysis and further optimize the algorithm for broader application scenarios.

References

W. Fuhl, M. Tonsen, A. Bulling, and E. Kasneci, “Pupil detection for head-mounted eye tracking in the wild: an evaluation of the state of the art,” Machine Vision & Applications, vol. 27(8), pp. 1-14, 2016.
Y. Derhalli, M. Nufal and T. Alsharabati, “Face detection using boosting and histogram normalization.” Electrical and Electronics Engineering Conference IEEE, pp. 1-6, 2016.
J. Sigut, and S. Sidha,ĀIris center corneal reflection method for gaze tracking using visible light ā , IEEE Trans. Biomedical Eng., vol. 58, no. 2, Feb. 2011, pp. 411-419.
Lin Y T, Lin R Y, Lin Y C, etal. Real-time eye-gaze estimation using a low-resolution webcam[J]. Multimed Tools Appl,2013,65:543-568.
T. Satriya, S. Wibirama, I. Ardiyanto, ”Robust Pupil Tracking Algorithm based on Ellipse Fitting,” in 2016 IEEE International Symposium on Electronics and Smart Devices, Bandung, Indonesia, 2016, pp. 253-257.
S. Wibirama, H. A. Nugroho, ”Towards understanding addiction factors of mobile devices: an eye tracking study on effect of screen size,” in 2017 39th Annual International Conference of The IEEE Engineering in Medicine and Biology Society, Jeju Island, South Korea, 2017, pp.2454- 245