Automatic Personality Recognition From Hindi Text

Categories: Common Sense


A personality is the complex of all attributes- behavioral, temperamental, emotional and mental- that characterize a unique individual [2]. Human personality has great impact on their lives. It influence on their life choices, well being and many other factors [3]. Research in the psychology literature has led to a well recognized model for personality recognition and description, called the Big Five Personality Model. It can be described in the following way [14]:

  • Extraversion- Extroversion measures a tendency to seek stimulation in the external world, the company of others, and to express positive emotions.

    People scoring high on Extroversion tend to be more outgoing, friendly, and socially active. They are usually energetic and talkative; they do not mind being at the center of attention, and make new friends more easily. Introverts are more likely to be solitary or reserved and seek environments characterized by lower levels of external stimulation

  • Neuroticism- measures the tendency to experience mood swings and emotions such as guilt, anger, anxiety, and depression.

    Get quality help now
    Writer Lyla
    Verified writer

    Proficient in: Free Essays

    5 (876)

    “ Have been using her for a while and please believe when I tell you, she never fail. Thanks Writer Lyla you are indeed awesome ”

    +84 relevant experts are online
    Hire writer

    People scoring low on Emotional Stability (high Neuroticism) are more likely to experience stress and nervousness, while people scoring high on Emotional Stability (low Neuroticism) tend to be calmer and self-confident.

  • Agreeableness- Agreeableness relates to a focus on maintaining positive social relations, being friendly, compassionate, and cooperative. People scoring high on Agreeableness tend to trust others and adapt to their needs. Disagreeable people are more focused on themselves, less likely to compromise, and maybe less gullible. They also tend to be less bound by social expectations and conventions, and more assertive.

    Get to Know The Price Estimate For Your Paper
    Number of pages
    Email Invalid email

    By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email

    "You must agree to out terms of services and privacy policy"
    Write my paper

    You won’t be charged yet!

  • Conscientiousness-Conscientiousness measures preference for an organized approach to life in contrast to a spontaneous one. People scoring high on Conscientiousness are more likely to be well organized, reliable, and consistent. They enjoy planning, seek achievements, and pursue long-term goals. Non-conscientious individuals are generally more easy-going, spontaneous, and creative. They tend to be more tolerant and less bound by rules and plans
  • Openness – Openness is related to imagination, creativity, curiosity, tolerance, political liberalism, and appreciation for culture. People scoring high on Openness like change, appreciate new and unusual ideas, and have a good sense of aesthetics

This Five Factor model of personality is most useful for describing personality and for assessing and describing personality disorders. Personality influences numerous facet of task related to individual behavior. For example, the success of most interpersonal task depends on the personalities of the participants and personality traits influence leadership ability. In forensic, it is useful in analyzing conversations of suspected terrorists. Tutoring system may be more efficient if they adopt learner’s personality. User profile with predicted personality traits would be used for recommendation system. It is important to automatically recognize user personality from spoken words and written text as it expresses huge information about speaker or author.

In the field of automatic personality recognition, various datasets are used as resources for personality identification and analysis. The following section describes various data sources. 2. Data Resources

The datasets for recognition of human personality are either collected from social media platform or even individuals are asked to write text, which is further collected and treated as data source.

Written Text and Conversation

The spoken words or written text convey immense information about speaker or author. The Stream-of-consciousness Essays is a large dataset written by psychology students who were said to write whatever comes into their mind for 20 minutes. It contains 2,479 essays with 1.9 million words. This data was collected and analyzed by [8]. Texts have been produced by students who took the Big5 test. This dataset has been used by different scholars in their research work [2][3]. The another source of data collected by [12] consists of 96 participants conversation extracts recorded using an Electronic Activated Recorder(EAR). It contains 97,468 words and 15,269 utterances. Essay corpus contains only text and EAR corpus contains both sound extracts and transcripts.

Social Media

Social Media is a place where users share their views, information, and ideas, and they do many activities like posting, status updating and commenting. User generated content on social media provide excellent opportunity to recognize user personality. Many researcher have been taken affords for utilizing data collected from Facebook and Twitter to infer personality from it.


Many approaches have been proposed to automatically infer user’s personality from the content of Facebook. The MyPersonality corpus collected from Facebook. It is released by organizers of the “Workshop on Computational Personality Recognition(Shared Task) [13] and it has been collected by David Stillwell and Michal Kosinski. It contains Facebook status message, author information (network size, betweenness, nbetweenness, density, brokerage, nbrokerage, and transitivity), gold standard labels(classes and scores) obtained using self assessment questionnaire. The classes have been obtained from scores with a median split. This has been collected from 250 users and number of statuses per user ranges from 1 to 223. This corpus has been utilizes by various researcher in their work [4], [5], [6].


The user generated content on Twitter also provides important source of information for inferring user’s personality. One of the Twitter datasets is collected through myPersonality project, only few hundreds of users among thousands of participants of this project posted links to their Twitter accounts, which forms content of this dataset [14]. This dataset have been utilized by researcher for the task of automatically personality recognition, as well as for user behavior analysis [19]. In [19] authors have found that both popular users and influentials are Extroverts and emotionally stable and also found that popular users are ‘imaginative’ means high in Openness, while influentials tend to be ‘organized’ means high in conscientiousness. On the other hand authors in [14] collected Twitter dataset which contains 102 twitter user and gold standard personality type label in range of [-0.5, 0.5]. An Author in [15] has been collected dataset from Twitter. They created Twitter application with 45- question version of Big Five Personality Inventory. The dataset contain latest 2,000 tweets. Authors have also collected set of statistics of user account and their tweets. It includes number of followers, number of following, density of the social network, Number of “@mentions”, number of replies, number of hashtags, number of links and word per tweet. The work in [15] considers connection between personality and actual social network for this author have considered two structural features number of friends and network density. This work has significance inference on marketing and interface design area.


The FriendFeed social media dataset was sampled by Celli et al. [9]. It has been collected from FriendFeed application, where recent posts are available. The aim of this work is to analyze social interaction take place in social network site. This dataset was used in [10] for personality recognition from social network site. Author have sampled dataset of 748 Italian FriendFeed users contains 1065 posts.

Other Resources

Data sources discussed so far used for psycho-linguistic features, lexical level analysis, emotion words and lexical clues to recognize personality. In paper [16] authors have been employ common sense knowledge with sentiment polarity scores and affective labels using resources SenticNet, ConceptNet and EmoSenticNet. The SenticNet resource is useful for opinion mining and sentiment analysis. It is collection of commonly used ‘polarity concepts’ with strong positive and negative polarity[20]. In SenticNet each concept is associated with one value float [-1, 1] represent their polarity. It includes more than 5700 polarity concepts and it is freely available. The ConceptNet[21] is a semantic network represent information from the Open Mind corpus. It contains nodes as a concepts and labeled edge are commonsense assertions that interconnect concepts. The EmoSenticNet [22]comprises about 5,700 common-sense knowledge concepts, including Wordnet Affect list concepts, along with their affective labels in the set {anger, joy, sadness, surprise, fear}. The authors in [16] have been combine common sense knowledge based features with psycho-linguistic features and frequency based features.


The features adopted by many researchers in their experiments that are motivated by prior findings related to correlation between measurable linguistic factors and personality traits [2].

 LIWC Features

The linguistic features extracted from text using LIWC text analysis program [23] that counts words in psychologically meaningful categories. It comprises two main components- the processing component and the dictionary. The dictionary is a collection of words that define a specific category [Positive emotion, Social process, Anger words, sadness]. The processing component goes through each word by word in text and then each word in text compared with the dictionary. If word in the dictionary having three categories then all three category incremented. After processing all words in text LIWC calculate percentage of each word category. The output of LIWC program is list of all categories and the rates that each category used in text. The LIWC features have been used by several researchers in their work [2] [3] [4].

MRC Features

The MRC Psycholinguistic database [17], contains psychological and distributional information about words. It includes 150,837 entries with information about 26 properties, such as number of phonemes, number of letters, frequency of use and familiarity [14]. This database consists of three files, DIST file (a dictionary of information about syntactic, semantic, orthographic and phonological properties of large set of words), S-R file (word association responses to a large set of stimulus words) and the R-S file (large set of response word). The MRC features have been utilized by the authors [2] [14].

 SPLICE(Structured Programming for Linguistic Cue Extraction) Features

The SPLICE is used to extract linguistic features, including cues that relate to the positive or negative self evaluation of the speaker [14]. It includes various features categories like Quantity, Part of Speech, Immediacy, Pronouns, Positive Self Evaluation, Negative Self Evaluation, Influence, Deference, Complexity, Tense, Senticwordnet, etc. The SPLICE features have been used in several studies in this field [4] [14] [18].

SNA Features

The Social Network Analysis features are provided by the myPersonality dataset which gives detail information of user’s friendship network [4]. It contains social network information such as Networksize, Betweenness, NBetweenness, Density, Brokerage, NBrokerage and Transitivity. This features have been utilized in several work for personality detection from social media content [4][6].

Literature Survey

This section presents comprehensive review on automatic personality recognition from the text. In [3] the experiment performed over Essay dataset to extract personality from it. Author used three Convolutional Filters to extract unigram, bigram and trigram features from each sentence. This experiment performed using Convolutional Neural Network, author trained five different networks for the five personality traits. For classification they used two layer perceptron consisting of a fully connected layer of size 200 and final softmax layer of size two, represent yes and no classes. The accuracy ranged between 50% to 62% depending on filter, personality trait and classification. The best performance was achieved for Openness using Multiple Layer Perceptron(MLP).

The data source essay used [3] also utilize in [16]. In paper authors have been employ common sense knowledge with sentiment polarity scores and affective labels using resources SenticNet, ConceptNet and EmoSenticNet. For personality recognition authors have combined common sense knowledge based features with psycho-linguistic features and frequency based features (LIWC, MRC) and then the features were used in supervised classifiers. In this work five Sequential Minimal Optimization (SMO) classifier have designed for five personality traits. In this experiment authors have shown that the use of common sense knowledge with affective and sentiment information enhances the accuracy of the existing work which use only psycho-linguistic features and frequency based analysis at lexical level. In this work performance evaluation is done by 10 fold cross validation. This experiment shown that Openness traits is easiest to identify as its F-score is 0.662 and Agreeableness traits is most difficult to identify among all traits as its F- score id 0.615. Authors have reported that new approach proposed in this work performs much better than previously reported state-of–art methods on the same dataset.

In recent times use of social networking has increased extremely. It has become popular application for information sharing and social interaction. It is a place where users represent their information, ideas, career interests, views, etc and therefore it is an excellent source for the research on personality computing [4], [5], [6], [7].

In [4], the experiment aimed at predicting personality from Facebook user statuses. Author used two dataset in this experiment myPersonality and manually collected dataset. The task performed with traditional machine learning algorithm Naïve Bayes, SVM, Logistic Regression, Gradient Boosting, Linear Discriminat Analysis(LDA) and Deep Learning architecture Multi Layer Perceptron(MLP), Long Short Term Memory(LSTM), Gated Recurrent Unit(GRU) and 1-Dimentional Convolutional Neural Network(CNN 1D). In this experiment several features were used such as LIWC, SPLICE and SNA. For traditional machine learning they used closed vocabulary approach (Predefined features) such as 85 feature from LIWC, 74 feature of SPLICE and SNA features. And for deep leaning implementation they used linguistic features of open vocabulary approach(not predefine feature) such as word embedding using Glove. Deep learning architecture MLP has highest average accuracy in myPersonality and LSTM+CNN 1D architecture has highest average accuracy in manually collected dataset. Deep learning architecture gave better result.

The experiments performed in [5] aimed at automatic recognition of Big-5 personality traits on Social network using user’s status text. The experiment performed on myPersonality corpus. This corpus collected from Facebook. The bag of words approach used for features extraction with unigram as features. For experiment, they used different classification methods such as Sequential Minimal Optimization for Support Vector Machine(SMO), Bayesian Logistic Regression(BLR) and Multinominal Naïve Bayes (MNB) sparse modeling. The result shows that MNB sparse generative model performs better than discriminative models SMO and BLR.

The approach proposed in [15] analyzes users Twitter profile. The experiments performed over the 2000 latest Tweets of 279 users collected from Twitter application. The features included not only the LIWC and MRC categories but also measurements of Twitter such as Number of followers and following, Density of Social Network, Number of “@mentions”, Number of replies, Number of “hashtags”, Number of links and Words per Tweet. Regression experiments were performed on data to access user Big Five personality. In this study two regression algorithms have used Gaussian Process and ZeroR, both had similar performance over the personality features. The authors shown result analysis that Twitter data yielded similar results for Openness and agreeableness but less impressive results in other traits.

The results and details of all the experiments summarizes in Table-1. It reports from left to right, dataset, the number of subject involved in the experiments, features, type of task and performance over different traits. The performance for the classification tasks is presented in terms of Accuracy, F-Measure and Mean Absolute Error. From the Table-1 we infer that result for Openness personality trait gives better result. Also we can say that in [4] result obtained by manually collected dataset gives better result as compared to MyPersonality dataset. In [3] authors have used Deep Learning algorithms, Mairesse and N-gram features, adding Mairesse feature has been proved beneficial in experiments. Due to insufficient training data, CNN alone without the document level features underperformed the Mairesse baseline. In [5], experiment semantic features have not been utilized, including this features may provide more information to recognize personality traits. The study in [16], incorporated common sense knowledge with psycholinguistic features, which led to effective result.

Cite this page

Automatic Personality Recognition From Hindi Text. (2022, Jan 27). Retrieved from

👋 Hi! I’m your smart assistant Amy!

Don’t know where to start? Type your requirements and I’ll connect you to an academic expert within 3 minutes.

get help with your assignment