Many researchers have worked in the eld of gesture based HRI, with an end goal of devel-oping a safe and sophisticated system that facilitates a smooth transfer of command fromthe user to the robot. Liu, et al.  conducted a study to identify the future trends ingesture recognition technologies.
They drew a graph based on their research, predicting thefuture trend of gesture recognition using the four technical components – Sensor Technology,Gesture Identi cation, Gesture Tracking and Gesture Classi cation. Their research showedthat the technology trend is moving towards in avoiding the use of any wearable sensors alltogether, so that the user can be move more freely.
They identi ed that the future trend ingesture based HRI is tending towards the use of depth sensor and skeleton model mapping. They reached this conclusion based on identifying published works in the eld of gesturerecognition HRI and classifying them among the fore mentioned components.
Another oneof their conclusions, which correlates with our research, is the use of multiple sensors tonegate the limitations of using just one sensors.
They also emphasized the importance ofa hard real time gesture recognition system, that provides almost instantaneous detectionof gesture commands.
All their research proved that the future of gesture based HRI willevolve beyond the use of just one sensor or just one algorithm to a HRI system that hasanetwork of sensors and nely tuned combinations of many control algorithms that smoothlycontrols the robot to execute user ‘s commands.
Ehlers and Brama’s  work on detecting both the human body and the hand- nger posefor gesture recognition, showed that this technique can be used to employ a wider and moreintricate set of commands.
The researchers here used a RGB-D camera to identify humanapproach and trained a support vector machine (SVM), to classify 29 hand- nger gestures8
Related Worksbased on the angles of the nger joints. Their HRI interface integrated ROS to test thissystem on mobile robot. Their research presented a real time HRI interface for applicationsbased on human-body pose and human- nger pose.
They concluded that the future workwould be to use a Kinect sensor to improve the range of human detection. While usingKinect can improve frame rate and computational cost but, I also believe that by avoidingthe use of SVM and replacing it with another sensor, the system can be more faster andef cient.
A recent study was conducted by Sheikholeslami et al. , looking into the various robothand con gurations for gesture based HRI system in cooperative industries. The researchwas conducted to explore the extent of as Human-Robot collaborative car door assemblytask.
The researchers examined the most frequently used hand gestures/con gurations andhow well they are understood by the intended recipients. Based on this study, they designedand developed a 7 Degree-of-Freedom (DOF) anthropomorphic robotic manipulator andstudied the effectiveness of using this robot hand instead of user, to show gestures/passgesture messages.
Through their work, they found out that the robot gestures were as wellreceived as the ones from a human user. Research conducted by Hernandez-Belmonte et al.  and Lim et al.  employed the useof intelligent algorithms like Real-Time Deformable Detector (RTDD) and ConvolutionalNeural Networks (CNN) to identify gesture commands using a RGB or RGB-D camera.
Butlike we have mentioned before, using these complex algorithm will slow down the com-putational speed and the system won’t be able to work to it’s fullest potential. As seen inHernandez-Belmonte’s  research, it takes 6-8 hours for to train their RTDD and this train-ing has to be done every time the user need to use new gestures or different con gurations.
Moreover, their frame rate is only close to 10 frames per second (fps). Such a low frame rateincreases the risk of not detecting a smooth gesture motion by user. Thus, there is a need to develop a system which does real time gesture recognition withoutthe need for any training, large dataset, classi cations and also a system that is computa-tionally cheaper and leaner.
One of the main aspects that we deal in our research in how does a robot perceive dis-tance and how comfortable would the HU be (at varying distances from the robot) whilecontrolling the robot through the proposed HRI system.
I looked into the study conductedby Huettenrauch et al.  on spatial distance and orientation of a robot with respect toauser. Humans are well trained in managing spatial interactions due to their day-to-day ac-tivities, but what about robots? This was the question that this paper hoped to solve.
Theresearchers also searched for patterns of spatial HRI behaviours that could be identi ed toimprove the design of a robots’ spatial conduct. For this, they used Hall’s proxemicsandKendon’s F-formation system .Hall’s proxemics divided the interpersonal spaces into four: Init-mate (0-0.46 meters),Personal (0.46-1.22 meters), Social (1.22-3.66 meters) and Public (>3.66meters) (as seen in Figure2.1).
Their experimental setup consisted of a “Wizard of Oz” setup
FI G U R E 2.1: Hall’s Proxemicsto control the robot (unbeknownst to the user) and make the user believe that the they arecontrolling the robot through their commands. Their study found out that user prefer tooperate in Personal space compared to other spaces.
But this result could be challenged onthe basis that the robot was not autonomous and the commands used by the user for this10Chapter 2. Related Worksexperiment was limited and not intricate. If we employ just one RGB-D camera (like Kinect),then their depth perception and tracking will be never function in the Intimate space.
So thegesture capture system for the robot will go blind once the user moves close to the robot. Our research aims to remove this blind-spot and enable HRI through out the fore-mentionedspaces. Our research would also enable us to further this experiment with an semi/full au-tonomous robot and with more intricate command and functions.
This research would also imply that the user and the robot act as a team when performingtasks. A research study conducted by Lasota and Shah , examined the human responseto a motion level robot. They conducted an experiment in which participants worked withan adaptive robot to perform collaborative tasks.
They then evaluated team uency, humansatisfaction and perceived safety and human comfort. The found out that people respondedwell to motion level robot adaptation and felt more comfortable working with an adaptiverobot rather than just a standard on. Their results further fueled our con dence in ourresearch.
We believe that performing HRI using our hybrid camera system can enhance theintuitiveness and control over the robot and also keep high level of satisfaction and comfortlevel of the user.
As stated in Chapter1, the objective of this research is to develop a safe and robust HRIsystem that employs real time gesture recognition without relying on a any database orgesture classi cation techniques. To achieve this, we propose a gesture-based HRI systemthat employs two different cameras that tracks the human approach and gestures at differentproximity of the human User.
The system switches between the two cameras based on theproximity of the user to the system, and therefore greatly improves the overall optical rangeof the system and its overall performance. The main hurdle in this research was to ndthe right pair of depth sensor cameras that can work together to improve the range andaccuracy of the system.
In this research Kinect for Xbox One (KinectX) and Leap MotionSensor (LMS) cameras are used together, such that the HRI system switches between thesetwo based on the proximity of user, then the overall ef cient optical range of the wholesystem will be greatly improved.
The HRI system will then be able to track and identifygestures with a range of 0.01 to 4.5 meters, which can then be interfaced with the robotarm to perform speci c tasks. KinectX and LMS were chosen due their functionality andbecause their specs correlates with what we intend to do in our research. The intent of this(A ) Microsoft Kinect for Xbox One(B ) Leap Motion
F I G U R E 3.1: The Camera Sensors research is to develop a system that enables the user to interact with the robot/machine inall spaces as categorized in Hall’s proxemics. This would require the use of cameras whoseoptimum range covers these spaces and either individually or in an integrated framework.
The cameras nalized for this research are the Kinect V2 and Leap Motion Sensor. Usingthese cameras together, such that the HRI system switches between these two based on theproximity of user, then the overall ef cient optical range of the whole system will be greatlyimproved.
The HRI system will then be able to track and identify gestures with a range of0.01 to 4.5 meters, which can then be interfaced with the robot arm to perform speci c tasks.
FI G U R E 3.2: Speci cations of KinectXOne of the most widely used depth sensor cameras is the Microsoft Kinect. Kinect is a line ofmotion sensing input devices developed by Microsoft, which enables user to interact withXBox (a Microsoft gaming platform) without the need of joystick or any other controller.
Thelatest one in their line of production is the Kinect for Xbox One(will be referred from now onas KinectX) camera(in Figure3.1a). It is designed for Xbox One but, using an adapter, itcan be used along side a personal computer(PC) as well. The Kinect line of products usesa combination of a RGB VGA camera, a depth sensor and a multi-array microphone asdata inputs for its purposes.KinectX works on the principle of Time of Flight(ToF).
It meansthat,KinectX transmits a invisible near-infrared(IR) light and measure the distance of theuser by calculating the time taken for the infrared light to re ect back from the user to thesensor. Using this technique, KinectX is able to track 26 joints in a user body and refreshit at the rate of 30FPS. Figure3.2, details the speci cations of KinectX sensor.
FI G U R E 3.3: Range of KinectXable to detect and track body joints from as far as 4.5 meters and as close as 0.5 meter. Therange of the KinectX is shown in the Figure3.3and it can be noted that the range of KinectXcan be divided into three. If the user is more than 8 meters away from the sensor, then thesensor fails in detecting/tracking the user.
The KinectX sensor starts tracking the user, assoon as the distance between the user and the sensor goes less than 4.5 meters. If you movetoo close to the KinectX (closer than 0.5 meters), the parallax effect between the emitted IRbeams and the depth sensor becomes really noticeable.
There is a small distance betweenthe depth sensor and the IR emitter in the design. Anything the depth sensor can “see” butthe emitted IR transceiver cannot detect, will be tagged as “unknown” or shadow areas. ThekinectX won’t be able to get the depth, so it will ag it as zero and hence the user becomesun-trackable.
In the the case of the user, parallax effect is signi cant if the user moves morethan 4.5 meters away from the KinectX sensor.Hence, the ideal range of tracking a userwould be from 0.5 meters to 4.5 meters (shown as Normal Valuesin the Figure3.3).
A comparison study of the accuracy of the depth measured by Kinect for Xbox 360 (anearlier model of Kinect and KinectX) has been done (Pagliari,D. and Pinto,L. (2015)),to determine how the newer line of Kinect has improved over it’s predecessor. It can beinferred from their work that the error in depth increases with the increase in the distancebetween the user and the sensor.
Studying all the above data, it can be con rmed thatKinectX works best at a range of 0.8 meters to 4.5 meters and it can track upto 8 meters butwith some variable error. In conclusion, KinectX is best suited for Extra-personal spacesbut not for peri-personal spaces(less than 0.8 meters).
The joint id map for this version ofKinectX is seen in Figure3.4. 14Chapter 3. The Hybrid Camera SystemFI G U R E 3.4: Joint Id map for KinectX3.2 Leap Motion Sensor (LMS)To complete the pair of Hybrid-cameras, a sensor which is able to track user at peri-personalspace is required. The second camera chosen for this research in the Leap Motion sensor(will be referred as LMS from now on), shown in Figure3.1(b).
As seen in the Figure3.5(in-spired by the actual dimensions and gures found in , the LMS is very small in dimension(3(L)x1(W)x0.5(H) inches). It is a sensor developed to track hand and nger motions. LMSis initially designed for Virtual Reality (VR) applications but the developers have providedtheir Software Development Kits(SDKs) for others to access and use LMS for individual re-search. LMS has a very wide Field of View (FoV), which is 150 degrees.
LMS uses opticalcontroller and infrared for tracking user ‘s hand. The effective and ideal range of LMS isfrom 25 to 1000 millimeters (0.025 to 1 meters). The LMS uses an internal software algo-rithm,which is based on a real user hand,to predict the nger and hand movements evenwhen parts of the hand is not visible. LMS is also able to detect the orientation of the3.2.
FI G U R E 3.5: Dimensions of Leap Motionhand and the state of the palms (open/close). LMS has two cameras and three IR LEDs,which coupled with wide angle lenses, gives LMS it’s wide FoV. The IR LEDs provides anadvantage of ltering out any background information which will further complicate theprocess.
The overall accuracy of the LMS camera was shown to be 0.7 millimeters. The LMSsoftware doesn’t generate a depth map but it uses advanced mathematical algorithms (notdisclosed by the company) on raw data images to identify the ngers and the hand. UnlikeKinectX, the LMS camera doesn’t use the ToF technique to calculate the depth from a frame.
The specialty of LMS is its close range detection and identi cation of hands and nger joints. As discussed before, LMS works differently than KinectX. It is due to the use of IR LEDs,that the range of detection is con ned very close to the system. Weichert et al.  evaluated the accuracy and reliability of the leap motion controller forsub-millimeter accuracy.
For this, the researchers setup a KUKA arm industrial robot andattached a tool to its end. They then connected the industrial robot to the leap motion camerasystem and mapped both their co-ordinate system. Through their experiment they wereable to determine how accurate the leap motion sensor would work in reality. Throughtheir calculations they were able to achieve an accuracy of 0.7mm in all three axis.
This work showed that the leap motion is very accurate outclasses the Microsoft Kinect in closeproximity gesture recognition.
After identifying the two cameras that will be used for this research, the next hurdled wouldto nd a common platform for them to work together. In order to reduce the complexityand the number of lines in the algorithm it was decided to use Python as the programminglanguage to build the platform to develop the system.
Python has has better readability anda better computer vision library as compared to its counterparts. As there would be a need tolink multiple libraries in one project, it was important to choose the best suitable Integrated Development Environment (IDE).
Microsoft’s Visual Studio IDE was chosen among the onesthat are available due to its ease of use and its large array of capabilities. The SDKs for Kinectand LMS were then linked to an empty project in the Visual Studio IDE, which acted as thefoundation for the algorithms that were developed for this research.