Dimensionality Reduction In Sentiment Analysis Using Colony Support Vector Machine

Categories: ScienceTechnology

Abstract

With the advent of technology in different areas, sentiment analysis is an emerging field with advancement of internet of things. Analysing the sentiments, views, estimations, conduct and feeling through textual or written format is known as sentiment analysis or opinion mining. Generally, huge amount of data is available on the internet for example the data that is present in blogs, assessment websites, feedback forums and so forth. Internet of things is incentive to these developments. A large amount of information that is available on internet is amorphous and manageable from internal areas in websites, evaluation sites, and review forums.

Presently, a number of people prefer online shopping, because there are several sources for buying products; thus making it less time consuming and cost effective. In addition, sentiments, views and feelings of the customers in the reviews and comments can be categorized as positive, negative or neutral; that helps the new customers to make decision about the quality of the product and about the company.

Get quality help now
Prof. Finch
Prof. Finch
checked Verified writer

Proficient in: Science

star star star star 4.7 (346)

“ This writer never make an mistake for me always deliver long before due date. Am telling you man this writer is absolutely the best. ”

avatar avatar avatar
+84 relevant experts are online
Hire writer

In this paper, a unique approach is built on a specific subject by trusting the reviews on social sites.

Proposed approach contains a list of the words that is used to design information based training group (positive keyword and negative keyword). Performance analysis has defined some likely consequences on the existing work when compared to our proposed model. Along with this, feature vector method is implemented using two stages after pre-processing method.

In this research, initially data is collected from the social sites like Amazon etc along with extraction of unique features from gathered information then adding it to features vector and value set .

Get to Know The Price Estimate For Your Paper
Topic
Number of pages
Email Invalid email

By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email

"You must agree to out terms of services and privacy policy"
Write my paper

You won’t be charged yet!

In this research work step by step description of reviews and sentiment analysis is performed. This paper also defines a comparative analysis of Machine learning algorithms with accuracy rate, class precision and recall.

Introduction

With the advancement of the blogging, review sites and social networking sites, huge amount of the data has been shared among the individuals in last few decades. A number of individual describe their opinion with set of features. The main feature those work in is giving the feedback to the companies about the products and list of the customers. The process of the selection of the large amount of the data is the sentiment analysis [1]. The recovery of the decision about the various products and classification is done on the basis of the recommendation or positive or may be negative feedback. The feedback regarding the products can be categorized as positive, negative or neutral [2]. Sentiment analysis is calculation of the decision, emotional view, occurrence, characteristics and feedback of the persons [3].

The opinion mining also known as sentiment analysis is the process where the opinions play the main role for making the choices and recognising the other person’s choice. Hence, there are no computational approaches required for making the choice of the emotion. In last few decades, opinion from the friends and the families was asked for building the choices. But in making the choices about the products and the services by the organisations group of survey has been done. Hence, the advancement in the content in the social sites which results vast transformation in the world. In current world of technology, the customers express the views on the blogs and social networking sites. The process is the sentiment analysis through websites [4][5]. At present, the reviews are seen on the web and searching reviews sentiment analysis of the customer reviews done by the industries and companies.

The survey is done for gathering the choices of the customers about the products. In different case, the sentiment analysis is hidden in post on the blogs .In the era of the internet technology, there is wide range of the advancement in the internet communication. The web data is not organised in the structured format so it is not easy to detect the data automatically [6]. The automatic detection of the behaviour (feeling, judgment, sentiments and beliefs) of the human is the sentiment analysis. In addition, classifying the information of sentiment analysis into positive; negative and other emotion pattern recognises the sentimental views of the data [2]. The high amount of the data taken for the different opportunities and one who gets the data may have unreliable data. The method used for the computation of the same emotional behaviour is sentiment analysis [7].

The recognition of the behaviour of the individual who pens or speaks for unique topic is the method of the text sentiment analysis [10]. In the social media analysis, there is an advancement of the audios and videos this method is text analysis of the sentiment data [11]. The method of the automatic recognition of the emotion of the person in the form of the texts is called as the sentiment analysis; this method is also called as the opinion mining method [12]. At Present, recognition and the distribution of the thoughts and opinions are in the form of the texts. With the advancement of the multimedia, there are different methods of determining the views, thoughts through internet with the increment in the content data [8].The method of the sentiment analysis is the identification and the extraction of the subjective data using the natural language and analysis of the text data[9]. Sentiment analysis is the method of the evaluation of the decision of the sentiments of the author in one domain or the multi domain. The main Applications of Sentiment Analysis are:

  1. In Business and Financial domain.
  2. Senti-WordNet as lexical resource.
  3.  Socialites.
  4.  Voting advice applications.
  5.  Monitoring of the opinion of the people.

In existing work, an automatic analysis of such information field of SA is used. SA data performing is subjected to several data pre-processing approaches and then verifying the idea in the comments and classifying them. Open Source Data Tool analysis simulation tool is known as RM is used to evaluate performance and step-by-step description of review processing. In base paper also defines analysis of the comparative study of various methods such as Naïve Bayes (NB) and SVM (Support Vector Machine).

In the proposed approach, kernel Principal Component analysis is used by extracting the features in vector form to reduce the dimensions in the reviews. After that, the novel colony support vector machine is implemented and then using feature extraction method that helps in the selection of the extracted features and classification is done in the form of the binary format (0, 1).

This paper contains 5 sections that describe sentiment analysis and Machine Learning Methods. Section I contains Introduction about the sentiment analysis, its applications, Existing work and proposed work. Section II presents various literature surveys in the field of sentiment or opinion mining. Section III and IV elaborates reading the proposed methodology and Evaluation of research work. The Last section V defines conclusion and future scope of sentiment analysis.

Related Work

Ren, R. and Wu, D. D etal., 2018[13] proposed a research on the analysis of the sentiments by using machine learning approach with support vector machine. In this research, the analysis of the day ofweek along with investor sentiment is taken into consideration which results in more realistic and valid sentiment indexes .In this research, the method of the integration of the support vector machine is through a realistic window. The experimental results describe about the probable approach about the features of the stock market and forecasting behaviour of the data. This also implies that sentiments will have very important information about resource intrinsic value. The method of the searching of the values regarded as the indication of the stock market data.

Hoogendoorn, M., and Berger, T etal., 2017[14] studied about the prediction of the models by the recognition the writing of the patients that suffers from social anxiety. In this research, the extraction of the features of the health of the patients is done by taking the text written by the patients into consideration, Also their style of writing and the basic sentiment of the messages is also taken into account. In this research, the prediction models used is produced on the basis of the machine learning algorithm based on the data sets of the 69 patients. The outcomeof the therapy is described using area under the curve (AUC) and the result of the area that may be under the curve werebased on the precision of about 0.78% on the entire treatment period.

In this research, the prediction of the emotion of the patients and the prediction of the modelling is done on the basis of the anxiety symptoms. The analyser may analyse the prediction models on the basis of the symptoms on weekly basis. The data sets of the 69 patients are taken for the diagnosis on the weekly basis. The outcomes were taken on the basis of the measurement of the social phobia by the therapist in three stages. Firstly, the demographic information is taken and after that the two months treatment was done by the therapist. Finally, the prediction of the models is done from the received emails over the given period of the time.

Tai, C. H., Tan, Z. H and Chang, Y. S etal., 2016[15] demonstrated a research on the target and the strength of the emotion of the person through views from the social network. The recognition of the positive emotions of the peoples was done through the online approaches. The detection was quite difficult due to overloading of the data. Hence, in this research, an emotion recognition system was established on the basis of the latent allocation method, sentiment word recognition method. By using this method the emotion and the objective of the feeling of the person was detected. The experimental research is done by the collection of posts of an individual on his/her social network site which indicates an individual’s feelings and intensity behind the feeling with the help of feeling distinguisher system. In this research, the recognising of the emotions is classified as strong positive, strong negative, positive and negative.

Scharl, A., Herring, D and, Rafelsberger, W etal., 2017[16] investigated the semantic approaches about the views of the emotions. In this research, the extraction of the context data was done using the cognitive therapy method. The visualisation method includes were philological, interpersonal and spatial context approach. The research describes about the recognition of the visual approach in real time data of the multiple coordinated views for recognising the contextual approach via semantic dimensions. The proper recognition of the context data describes about the extracted features. In this research, the contextual data recognise the mutual data between the objects that include location and organisation and social media which are Twitter, Facebook, Google+ and YouTube.

Table 1:- Various Literature Survey Studied and analysed.

Methods Merits Demerits
SST algorithm (Shingle) [17] Studied emotions through SA Slow Speed
SVM, NB and NN [18] Support vector method best for dataset. Analysis limited.
POM [19] Emotion detect successfully Application interface doesn’t give message back.
FCM [20] Optimize the feature sets Clustering / grouping issues in business.

Table 1 describes the various papers studied and name of the techniques used, merits and demerits of the techniques used for sentiment analysis from textual data.

Proposed Research Work

Sentiment Analysis uses C-SVM classification tool in order to classify the sentiments as Negative, Positive and Neutral. The following research methodology was implemented.

Standard Dataset

Search the dataset from the online sites such as Amazon, Snap deal and Flip kart. These are open source dataset. The information is collected using reviews enter by the users which may contain nouns, pronouns, adjectives etc and extraction is used to segment individual sentences thus we use this information to create our own dataset containing positive and negative keywords. The comments on the various methods and materials for example an electronic product and its corresponding rating are also defined in the different instances.

In sentiment analysis main problem is un-structured data and then using the un-supervised machine learning methods to classify the un-structured data with the help of various data pre-processing methods. The following data pre-processing methods have been useful for the comments or reviews. These are some examples of our dataset present in the knowledgebase of our research paper containing positive and negative keywords.

Data pre-processing

In this phase, the input data is pre-processed before extracting the characteristics that includes removing the tags, eliminating missing instances, removing missing instances or attributes with the another variables in-order to neglect the errors. Dataset used in EXCEL or CSV file is used and comments are expressed in ENGLISH language.

Main Pre-processing steps are:

  1. Tokenization
  2.  Stemming

The tokenization process is the act of division of series of string into segments like words, keywords, phrases and signs elements known as tokens. It can be individual phrases, words or complete sentences. In this procedure of tokenization, few alphabets such as punctuation marks are rejected.

It is utilized in CS, where tokenization plays a main role and also in huge section in the process of LA (Lexical Analysis).

Example: He is a good Seller.

Token Output:

Stemming process is defined as the process of optimizing word to its word stem or root forms normally an inscribed written word format. It is not required to be equal stem, even if this stem isn’t in itself a reasonable root.

Example: She is singing a song.

Stem Output: Sing.

Kernel PCA used for Feature Extraction

The feature extraction method is developed to fetch the unique properties in the vector form like matrix (r,c)). KPCA (Kernel Principal Component Analysis) method is used to extract the individual components in the form of Eigen values and vector values. It is an extension to the PCA algorithm. It computes data into a novel Feature space that helps in dimensionality reduction. It captures the overall variance of designs and helps in removing noises in the images.

Steps of Feature Extraction (KPCA) Method.

Let k =1.

For each data point ; if =kotherwise let 1=+1 otherwise let = -1.

Next, for assigning the nearest method, use kernel regression (,J=1….) to the testing set, using new labels =1,….,as input for labelling method.

If k , then increase k by 1, and then return to step 2,otherwise continue to next step.

Foe each data poin.Iiterate through classes k=1,....,q, For first k for which =1, let =k and go to next data point. If no such k exists, then let =q.

Selection of the feature (C-SVM) Algorithm

The classify matrix of similarity values is created and given to machine learning algorithms that is colony-SVM for its training. The colony-SVM classification method selects the kernel feature set; divide the valuable data to calculate the minimum and maximum distance of the feature vector. If equal distance found out in the divided phase the modification phase will help to transfer the equal data. Training of Colony-SVM is performed in MATLAB (2016a).Training and Testing is done with 70-30 ratio.70 % part of data is given to training and 30% part of data is given to testing. Presentation of each classifier is analysed; evaluation is done on basis of presentation parameters those are true positive, true negative, accuracy rate, recall rate and precision.

Proposed algorithm (C-SVM algorithm)

Start round do while (Ending criteria not-correct ) – cycle loop do until(each ant or comments completes a round) – round loop Local trail configure End do Study round Global round update End do.

It is used to resolve the minimum rate issues. It may normally have N number of comments and un-directed arcs. There are twice working number of comments for the nodes or reviews: either backwards or forward.

The comments memory gives them information to re-trace the route it has followed while searching for the sink node or review.

Before, moving backward on their memorized route, they change any loops from it. While moving rearward, the comments consent pheromones on the arcs they crossed.

An initialization of the search procedure, constant amount of pheromone is allotted to all comments. When positioned at a comment I ant k uses the pheromone trail to evaluate the probability of selecting j as next_node.

Using this rule, the probability increases that forth-coming reviews will use this word.

After that word k has changed to next phase, the pheromones evaporate by the following eq at all the words.

X,Y defined with training labelled data, 0, partial_train_SVM.

Evaluation

In this section, description about result and performance analysis of the sentiment analysis is mentioned. In this research work, performance of the applied proposed model (C-SVM) is checked in order to verify the accuracy rate and class precision. The un-supervised methods that were used in SVM and ACO are combined by c-svm method.

Test Review Sentiment Analysis
He is a good Seller. Positive
She is very Beautiful. Positive
She is bad Girl. Negative
He is good student but his handwriting is bad. Neutral
It might be good. Neutral

Table 2 describes about the test review in the given knowledge base. In this testing phase to analyse the review based on distance comparison between training set and testing set feature in this dataset. If distance same then result evaluation done based on precision and accuracy rate.

Parameter Metrics Proposed Work (C-SVM) Existing Work (SVM)
Precision 91 86
True Negative 0.08 4
True Positive 72 69.7
Recall 98 96.1

As shown in Table 3 it is clear that this table compares the performance of the proposed colony-SVM with the SVM. The results shows that precision, recall and the true positive rate of the C-SVM is higher than the traditional SVM, which implies that the proposed model is more feasible.

In this section comparison of performance parameters of the two classifiers using SVM and C-SVM algorithm were evaluated. Performance parameters used in the proposed model that are Precision, True negative rate, True positive rate and Recall rate has been compared with the traditional SVM in figure 3,4,5,6 respectively.

In this paper, classifier are analysed using MATLAB 2016a Simulation. In research work two types of normal classifiers were used (SVM and C-SVM) for sentiment classification. SVM and C-SVM Classifiers are designed using MATLAB construct in methods. Classifier is designed using GUI (Graphical User Interface). Performances of these classifiers are defined in Figure 3,4, 5 and 6. All of this classifier has almost similar performance metrics. The main issue is main tokenization process. Language like in British and American are defined to be un-segmented words. For tokenization process these languages need some addition lexical information. Errors in Stemming process means dissimilar stems are stemmed to the similar roots called True Negative and True Positive Rate.

Conclusion and Future Scope

This research work it is concluded that SA (Sentiment Analysis) on the various comments, reviews given by the users have been evaluated using Machine Learning methods that are designed using C-SVM model .The novel approach classifies the text into 3 categories that is POSITIVE; NEGATIVE AND NEUTRAL depends on their distance similarity. SA has been established efficiently to predict consumer feelings by studying SOCIAL DATA and REVIEWS. In this work, we implement a new approach that helps in extracting the consumer review about a specific subject by relying on SOCIAL MEDIA AND SHOPPING SITES’s comments and reviews which are expressed by the users time to time.

This paper combines the advantages of KPCA and C-SVM. First the data is collected from the user that is reviews an individual post on various social media sites then this collected data is subjected to various pre-processing phases such as feature extraction and feature selection with help of KPCA and C-SVM is used as final text classifier.

The Future work is for implementing Sentiment Analysis by using methods like Decision Tree and K-nearest Neighbour Algorithm. To improve the data pre-processes word embedding DNNs ( Deep Neural Network) can also be used in addition to the C-SVM.

References

  1. Poria, S., Peng, H., Hussain, A., Howard, N., & Cambria, E. (2017). Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis. Neurocomputing, 261, 217-230.
  2. Tan, S., & Zhang, J. (2008). An empirical study of sentiment analysis for chinese documents. Expert Systems with applications, 34(4), 2622-2629.
  3. Benamara, F., Cesarano, C., Picariello, A., Recupero, D. R., &Subrahmanian, V. S. (2007, March). Sentiment analysis: Adjectives and adverbs are better than adjectives alone. In ICSM (pp. 1-7).
  4. Mäntylä, M. V., Graziotin, D., &Kuutila, M. (2018). The evolution of sentiment analysis—A review of research topics, venues, and top cited papers. Computer Science Review, 27, 16-32.
  5. Alessia, D., Ferri, F., Grifoni, P., &Guzzo, T. (2015). Approaches, tools and applications for sentiment analysis implementation. International Journal of Computer Applications, 125(3).
  6. Li, N., & Wu, D. D. (2010). Using text mining and sentiment analysis for online forums hotspot detection and forecast. Decision support systems, 48(2), 354-368.
  7. Liu, B., & Zhang, L. (2012). A survey of opinion mining and sentiment analysis. In Mining text data (pp. 415-463). Springer, Boston, MA.
  8. Liu, B. (2010). Sentiment analysis: A multi-faceted problem. IEEE Intelligent Systems, 25(3), 76-80.
  9. Batrinca, B., &Treleaven, P. C. (2015). Social media analytics: a survey of techniques, tools and platforms. Ai & Society, 30(1), 89-116.
  10. Fereday, J., & Muir-Cochrane, E. (2006). Demonstrating rigor using thematic analysis: A hybrid approach of inductive and deductive coding and theme development. International journal of qualitative methods, 5(1), 80-92.
  11. Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational linguistics, 37(2), 267-307.
  12. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1), 1-47.
Updated: Feb 23, 2024
Cite this page

Dimensionality Reduction In Sentiment Analysis Using Colony Support Vector Machine. (2024, Feb 13). Retrieved from https://studymoose.com/document/dimensionality-reduction-in-sentiment-analysis-using-colony-support-vector-machine

Live chat  with support 24/7

👋 Hi! I’m your smart assistant Amy!

Don’t know where to start? Type your requirements and I’ll connect you to an academic expert within 3 minutes.

get help with your assignment