Limitations of Current Human-Machine Interaction (HMI) Frameworks

Categories: Artificial Intelligence Communication Human

Essay, Pages 9 (2070 words)

Views

Current Human Machine Interaction (HMI) frameworks presently can't seem toachieve the full passionate and social abilities fundamental for rich and powerfulconnection with people to characterize faces in a given single picture or successionof pictures as one of the six essential feelings. conventional AI methodologies, forexample, booster vector machines and Bayesian classi ers, have been e ective whileordering presented outward appearances in a controlled domain, late inves-tigationshave demonstrated that these arrangements don't have the adaptability to grouppictures caught in an unconstrained uncontrolled way ("in the wild") or whenconnected databases for which they were not structured.

Don't use plagiarized sources. Get your custom essay on

“ Limitations of Current Human-Machine Interaction (HMI) Frameworks ”

Get custom paper

NEW! smart matching with writer

This poor gen-eralizabilityof these strategies is essentially because of the way that numerous methodologiesare subject or database needy and just t for perceiving misrep-resented orconstrained articulations like those in the preparation database. In addition, obtainingaccurate training data is particularly di cult, especially for emotions such as anger orsad which are very di cult to accurately replicate. Recently, due to an increase in the ready availability of computational powerand increasingly large training databases to work with, the machine learningtech-nique of neural networks has seen resurgence in popularity.

Recent state ofthe art results have been obtained using neural net-works in the elds of visualobject recognition, human pose estimation, face veri cation and many more. Even in the FER eld results so far have been promising. Unlike traditionalmachine learning approaches where features are de ned by hand, we often seeimprovement in visual processing tasks when using neural networks because ofthe network's ability to extract unde ned features from the training database.

It isoften the case that neural networks that are trained on large amounts of data areable to extract features generalizing well to scenarios that the network has notbeen trained on. We explore this idea closely by training our proposed networkarchitecture on a subset of the available training databases, and then per-formingcross-database ex-periments which allow us to accurately judge the network'sperformance in novel scenarios.In the FER problem, however, unlike visual object databases such as imageNet, existing FER databases of-ten have limited numbers of subjects, few sample imagesor videos per expression, or small variation between sets, making neural networkssigni cantly more di cult to train. For example, the FER2013 database (one of thelargest recently released FER databases) contains 35,887 images of di erentsubjects yet only 547 of the images portray disgust. Similarly, the CMU MultiPIE facedatabase contains around 750,000 images but it is comprised of only 337 di erentsubjects, where 348,000 images portray only a neutral" emotion and the remainingimages portray anger, fear or sadness respectively. Dept. of CSE, DSCE, Bangalore 78 1Facial Expression Recognition using Neural Networks2 PROBLEM STATEMENT2 Problem StatementHuman facial expressions can be easily classi ed into 7 basic emotions:happy, sad, surprise, fear, anger and neutral. Facial emotions are expressedthrough the activation of speci c sets of facial muscles. These sometimessubtle, yet com-plex, signals in an expression often contain an abundantamount of information about our state of mind. Through facial emotionrecognition, we are able to measure the e ects that content and services haveon the users through an easy and low-cost procedure. For example, retailersmay use there metrics to evaluate customer interest. Healthcare providerscan provide better service by using addi-tional information about patients' emotional state during the treatment. Humans are well-trained in reading theemotions of others, in fact, at just 14 months old, babies can already tell the dierence between happy and sad. We designed a deep learning neural networkthat gives machines the ability to make inferences about our emotional states. Facial expression recognition is a process performed by computers, whichcon-sists of:1. Detection of the face when the user comes in the web cam's frame. 2. Extracting facial features from the detected face region and detectingthe shape of facial components or describing the texture of the skin in afacial area. This is called Facial Features Extraction. 3. After the feature extraction, the computer categorizes the emotion statesof the user through the datasets provided during training of the model.Dept. of CSE, DSCE, Bangalore 78 2Facial Expression Recognition using Neural Networks3 LITERATURESURVEY3 Literature Survey3.1 Human Facial Expression Recognition from Static Im-ageusing Shape and Appearance FeatureAuthors: Naveen Kumar , H N Jagadeesha , S Amith Kjain. Description:This paper proposes a Facial Expression Recognition using Histogram of Ori- ented Gradients (HOG) and Support Vector Machine(SVM). The proposed workshows how HOG features can be exploited for facial expression recognition. Use ofHOG features make the performance of FER system to be subject independent. Theaccuracy of the work is found be 92.56% when implemented using Cohn-kanadedataset for six basic expressions. Results indicate that shape features on face carrymore information for emotion modelling when compared with texture and geometricfeatures. Shape features are better when compared with geometric features due tothe fact that a small pose variation degrades the performance of FER system whichrelies on geometric features where as a small pose variation doesn't re ect anychanges on a FER system which relies on HOG features. De-tection rates for disgust,fear and sad is less in the proposed work. Detection rates can be further improved bycombining shape, texture and geometric features. Op-timized cell sizes may beconsidered for real time implementation so as to address both detection rate andprocessing speed. The in uence of non-frontal face on the performance of FERsystem could be addressed in the future work. 3.2 Face Detection and Recognition using Viola-Jones al- gorithm and Fusion of PCA and ANNAuthors : Narayan T. Deshpande , Dr. S. Ravishankar, Description :This paper propose to Face recognition, Principal Component Analysis, Ar-ticial Neural Network, Viola-Jones algorithm. The paper presents an e cientapproach for face detection and recognition using Viola-Jones, fusion of PCA andANN techniques. The performance of the proposed method is compared withother existing face recognition methods and it is observed that better accuracy inrecognition is achieved with the proposed method.Face detection and recognitionplays a vital role in a wide range of applications. In most of the applications ahigh rate of accuracy in identifying a person is desired hence the proposedmethod can be considered in comparison with the existing methods. Dept. of CSE, DSCE, Bangalore 78 3Facial Expression Recognition using Neural Networks3 LITERATURESURVEY3.3 Facial Expression RecognitionAuthors : Neeta Sarode , Prof. Shalini BhatiaDescription :This paper propose to grayscale images; face; facial expression recognition; lipregion extraction; human-computer interaction. Experiments are performed ongrayscale image databases. Images from Yale facial image database and JAFFEdatabase (Figure 7) are used for experiments. JAFFE database consists of grayscaleimages. The database consists of Japanese Female Facial Expressions that have 7expressions of 10 people including neutral. Each person has 3-4 images of sameexpression, so the total number of images in the database comes to 213 images. Ane cient, local image- based approach for extraction of intransient facial fea-tures andrecognition of four facial expressions was presented. In the face, we use the eyebrowand mouth corners as main `anchor' points. It does not require any manuaintervention (like initial manual assignment of feature points). The system, based on alocal approach, is able to detect partial occlusions also. 3.4 Comparision of PCA and LDA Techniques for FaceRecognition Feature Based Extraction With AccuracyEnhancementAuthors : Riddhi A. Vyas , Dr.S.M. Shah Description:This paper propose to Face recognition, PCA, LDA, Eigen value, Covariance, Euclidean distance, Eigen face, Scatter matrix. A feature extraction is a quitetricky phase in a process of Recognition. To get better rate of face recognition thecorrect choice of algorithm from many for feature extraction is extremely signicant and that plays signi cant role in face recognition process. Before se-lectingthe feature extraction techniques you must have knowledge of it and which oneperforms accurately in which criteria. In this comparative analysis, it is pro-videdwhich Feature extraction technique is performs accurate in di erent criteria. Fromindividual conclusion it is clear and proves that LDA is e cient for facialrecognition method for images of Yale database, comparative study mention thatLDA achieved 74.47% recognition rate with training set of 68 images and out of165 images total 123 images are recognized with higher accuracy. In future FaceRecognition rate can be improved that includes the full frontal face with facialexpression using PCA and LDA. Face recognition Rate can be improved withhybrid preprocessing technique for PCA and LDA. Both feature extraction tech- nique cannot give satis ed recognition rate for Illumination problem so it can beimproved. PCA and LDA can be combining with other techniques DWT, DCT, LBP etc can improve the face recognition rate. Dept. of CSE, DSCE, Bangalore 78 4Facial Expression Recognition using Neural Networks3 LITERATURESURVEY3.5 Facial Expression Recognition Using Visual Saliency andDeep LearningAuthors : Viraj Mavani , Shanmuganathan Raman , Krishna PrasadMiyapuram. Description:This paper propose to Facial Expression Recognition Using Visual Saliency andDeep Learning. We have demonstrated a CNN for facial expression recognition withgeneralization abilities. We tested the contribution of potential facial regions ofinterest in human vision using visual saliency of images in our facial expressionsdatasets.The confusion between di erent facial expressions was minimal with highrecognition accuracies for four emotions { disgust, happy, sad and surprise [Table 1, 2]. The general human tendency of angry being confused as sad was observed[Table 1] as given in [22]. Fearful was confused with neutral, whereas neutral wasconfused with sad. When saliency maps were used, we observed a change in theconfusion matrix of emotion recognition accuracies. Angry, neutral and sad emotionswere now more confused with disgust, whereas surprised was more confused asfearful [Table 2]. These results suggested that the generalization of deep learningnetwork with visual saliency 65.39% was much higher than chance level of 1/7. Yet,the structure of confusion matrix was much di erent when compared to the deeplearning network that considered complete images. We conclude with the keycontributions of the paper as two-fold. (i), we have presented generalization of deeplearning network for facial emotion recognition across two datasets. (ii), we introducehere the concept of visual saliency of images as input and observe the behavior ofthe deep learning network to be varied. This opens up an exciting discussion onfurther integration of human emotion recognition (exempli ed using visual saliency inthis paper) and those of deep convolutional neural networks for facial expressionrecognition.Dept. of CSE, DSCE, Bangalore 78 5Facial Expression Recognition using Neural Networks 4 ARCHITECTURE4 Architecture4.1 Block DiagramApplying the Inception layer to applications of Deep Neural Network has hadremarkable results, and it seems only logical to extend state of the art techniquesused in object recognition to the FER problem. In addition to merely providingtheoretical gains from the sparsity, and thus, relative depth, of the network, theInception layer also allows for improved recognition of local features, as smallerconvolutions are applied locally, while larger convolutions approximate globalfea-tures. The increased local performance seems to align logically with the waythat humans process emotions as well. By look-ing at local features such as theeyes and mouth, humans can distinguish the majority of the emotions . Similarly, chil-dren with autism often cannot distinguish emotion prop-erly without being toldto remember to look at the same lo-cal features . By using the Inception layerstructure and applying the network-in-network theory proposed to we can expectsigni cant gains on local feature per-formance, which seems to logically translateto improved FER results. Dept. of CSE, DSCE, Bangalore 78 6Facial Expression Recognition using Neural Networks 4 ARCHITECTURE4.2 Deep Neural Architecture4.3 Deep Neural Network Architecture"Dept. of CSE, DSCE, Bangalore 78 7Facial Expression Recognition using Neural Networks 4 ARCHITECTURE4.4 Network ArchitectureDept. of CSE, DSCE, Bangalore 78 8Facial Expression Recognition using Neural Networks 4 ARCHITECTUREBene t of the network-in-network method is that along with increased localperformance, the global pooling performance is increased and therefore it is lessprone to over tting. This resistance to over tting allows us to increase the depth ofthe network signi cantly with-out worrying about the small corpus of images thatwe are working with in the FER problem. The work that we present in this paperis inspired by the techniques provided by the GoogLeNet and AlexNet architectures described in Sec. 2. Our network consists of two elements, rst ournetwork contains of two traditional CNN modules (a traditional CNN layerconsists of a convolution layer by a max pooling layer). Both of these modulesuse recti ed linear units (ReLU) which have an activation func-tion described by:f(x) = max(x; 0) (1)where x is the input to the neuron. Using the ReLU activation function allowsus to avoid the vanishing gradi-ent problem caused by some other activationfunctions (for more details see ). Following these modules, we apply thetechniques of the network in network architecture and add two "Inception" style modules, which are made up of a 1 1, 3 3 and 5 5 convolution layers(Using ReLU) in parallel. These layers are then concatenated as output andwe use two fully connected layers as the classifying layers (Also using ReLU). Figure 3 shows the architecture of the network used in this paper. 4.5 Data Set DiagramDept. of CSE, DSCE, Bangalore 78 9Facial Expression Recognition using Neural Networks 4 ARCHITECTURE4.6 Flow ChartDept. of CSE, DSCE, Bangalore 78