Feature Based Classification of Data Using Support Vector Machine

Categories: MathScience

Abstract

Emergence of social networking sites and product review blogs made the sentimental classification more popular among the scientists. In order to increase the quality of classification, we use support vector machine as it is a best learning model for classification and regression that outstands in noisy and complex domains. The best property of SVM is its generalization theory and kernel function. Generalization theory provides a scrupulous way to choose hypothesis whereas kernel function introduces hypothesis space non-linearity without any explicit requirement of non-linear algorithm.

This paper concentrates on the generalization theory and also few key points to the practitioner of data mining who tends to use support vector machines.

Introduction

Online generated reviews by user are of great use as they have take an inexorable involvement in the process of decision-making by customers on purchase of products, booking of hotels etc. and also they collectively establish a low cost and effective feed-back channel, which helps business organization to track their fame and to enhance the quality in their products and services [1].

Get quality help now
Marrie pro writer
Marrie pro writer
checked Verified writer

Proficient in: Math

star star star star 5 (204)

“ She followed all my directions. It was really easy to contact her and respond very fast as well. ”

avatar avatar avatar
+84 relevant experts are online
Hire writer

Product ratings represented as star help customers in their decisions of purchase by driving more qualified customers to shop in your shopping sites.

To our expertise, most of the existing probabilistic joint sentiment-topic models are unsupervised (or) partially supervised. This means that their prime use if for text content generated by user and have not taken overall ratings or labels of text documents. As a result, although they capture the hidden thematic structure of text data, the models cannot directly predict the overall sentiments (or) text document ratings.

Get to Know The Price Estimate For Your Paper
Topic
Number of pages
Email Invalid email

By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email

"You must agree to out terms of services and privacy policy"
Write my paper

You won’t be charged yet!

Instead, they relied on document-specific sentiment distribution to reckon the whole sentiments of documents[12].

Usually, overall ratings are considered as online reviews. For example, one to ten star rating which can be naturally regarded as sentiment labels of the text reviews as bag-of-words representation.

Related work: classification of sentiments can be done by two popular methods

  •  Lexicon Based Approach
  •  Machine Learning

Lexicon based approach is classified as a dictionary and corpus based approach. In dictionary based approach, translation of every word is done by the dictionary. Its drawback is that it nowhere looks for correlation among words, hence it is difficult to find opinions with respect to context. Alternatively, corpus based approach uses context based orientations to find opinions. In the perspective of learning; Machine learning models have a great impression.

Support Vector Machine outperformed every other classifier of machine learning as it is a binary classifier. And our classification of sentiments problem can be considered as binary in nature i.e.either positive or negative [2, 13]. Naive Bayes classifier performed well due to its simplicity and probabilistic nature. Pang and Lee [3], Dave etal. [4], Kennedy and Inkpen [5], Granitzer [6], Tan and Zhang [8], and S. A. Kawathekar [7] used SVM and NB for sentiment classification on different datasets. From all the papers, we observed SVM as the best classifier.

Methodology

Feature Extraction (Stage 1)

The first step of review sentiment classification is an extraction of feature matrix. Reviews cannot be given directly to an optimization algorithm as they are in the form of sentences. So, we need to convert reviews into feature matrix. First, extract reviews from data file and then create a bag-of-words from it. Mapping of each review to this feature vector will give a feature matrix. Basic steps in data cleaning is removing redundant words, special characters and symbols.

BBO Based Feature Selection (Stage 2)

After the feature extraction, dimension of feature set is large. To reduce this, removal of irrelevant features from extracted features is done by Biogeography based optimization.

1) Habitat Representation: Possible feature subset for sentiment classification problem will be represented by each habitat. The features extracted in stage 1 represent Suitability Index Variables (SIVs) that may or may not be a part of every subset. The habitat size is same as the SIVs count present in it.

2) Habitat Suitability Index (HSI): This is correspondent to fitness function of a habitat which depend on SIVs count. Subset of features is selected in such a way that it has a minimum SIV count as wel1 as minimal rate of error for it.

HSI: f(SIV1,SIV2...SIVn) = (Wa x Error ratei) + (Wb x Ln)

Where weight associated with the error rate of habitat is Wa and weight associated with the selected number of variables (SIVs) is Wb. Ln is equal to selected count of SIVs divided by total count of SIVs. Rate of error for each habitat is calculated by classifier.

3) Suitability Index Variables: They are used to represent values for features used in the data set. As our data set contains binary values, each SIV take value either 1 or 0 . This indicate the presence or absence of a feature in a habitat. Due to mutation, value may toggle between 0 or 1.

4) Emigration and Immigration rate: Emigration and Immigration rates are represented by E and I, are typically set to ‘1’ at the beginning of algorithm. Emigration and Immigration rate is calculated for each habitat with the fol1owing equations.

λi = I (1-i/n) (1)

µi = Ei/n (2)

Where I is the maximum immigration rate obtained from equation(1); highest emigration rate I is obtained from equation (2); i is the species count that belongs to ith individual and maximum number of species is represented by n.

5) Mutation: Mutation is implemented with a specified probability at random. For our future work, we will use feature ranking based on Information Gain to implement mutation.

Implementation

Classification is a supervised learning model in data mining used to classify data instances based on their features and Support Vector Machine is suitable to classify data instances both in a linear and non linear way. Supervised learning is a section of machine learning in which in we have a training data with its appropriate class labels. Classification is a dual-phase technique where first phase is construction of model and second phase is usage of model. Many existed algorithms are used for classification in data mining. Following are some classification techniques [14]:

  1.  Decision tree induction.
  2.  Rule based classifier.
  3.  Bayesian classifier.
  4.  Artificial neural network.
  5.  Nearest neighbor Classifier.
  6.  Support vector machine.
  7.  Ensemble classifier.

Among these classification techniques, Support Vector Machines are the first implemented to solve the classification of patterns and regression problems by Vapnik [10] and his colleagues. SVM is the suitable model for the classification and regression problems[2]. They are categorized under generalized linear classification. SVM has a unique property of the empirical classification error minimization and the geometric margin maximization. So, SVM is said to be Maximum Margin Classifier and is based on the Structural risk

g(x) = ω1x+ω0

Minimization (SVM):

SVM for linearly separable binary sets. The aim is to define a hyperplane that classify all the training set into two classes.

To classify above elements

  • Class square
  • Class Zeros

The best hyper plane is that which keeps the maximum margin from both classes

The margin of Z2 is high to compare to the margin of Z1.

The hyper plane equation is defined as

Here, the equation 1 delivers greater than or equal to 1 for input vectors belonging to class 1. We scale these hyper plane so that it will deliver the values smaller than or equal to -1 for all values which belongs to class 2.

The distance from the closet elements will be at least 1 the modulus is 1. Hence from the geometry known that the distance between the hyper plane is computed by these equations the total margin is computed by

Minimizing the is a nonlinear optimization task, solved by the karush-kuah-tucker (kkt) conditions, using langrange multipliers

The value of is the solution of the summation in the above equation. By solving these equations, we tried to minimize the and maximize the margin or separability between two classes.

Results

By running tests on KNN, Naïve Bayes and SVM, we identified the difference in accuracy and time taken among the three. We computed the values using rapid miner tool. We took CSV file as input and converted it then used Ripeley set and clustered it using rapid miner. For SVM we choose SVM -> Apply Model -> Performance and used Tableau to visualize results by considering SONAR dataset. We can observe that SVM has better accuracy and better recall than other classification methods

Here we have considered weighting dataset that has about 500 tuples and we can see that accuracy score, precision score and recall of SVM is far better than Naïve Bayes and KNN.

Here we have applied a filter to weighting dataset to obtain a smaller tuple size and we can observe that accuracy score, precision score and recall is higher in SVM than in other classification methods.

Considering three different possible datasets, tested different classification methods for their accuracy, precision and recall and observed that SVM provide better classification metrics than others.

Conclusion

The Support Vector Machine is a powerful learning model for many aspects of data mining like classification, regression and outlier detection. Regularized hypothesis is tuned by statistical learning theory by SVM to fit data without any overfitting. SVM uses generalization theory to get optimized without any separate validation set. It is seen that to tune parameters for kernel is complex for a specific amount of data.

References

  1. ZhenHai, Gao Cong, Kuiyu Chang, PengCheng, and ChunyanMiao,“Analyzing Sentiments in one go: A Supervised Joint Topic Modeling Approach”, IEEE Transactions on Knowledge and Data Engineering (Volume: 29, Issue: 6, June 1 2017).
  2. RamshaShahid, SobiaTariqJaved, KashifZafar, “Feature Selection Based Classification of Sentiment Analysis using Biogeography Optimization Algorithm”, at International Conference on Innovations in Electrical Engineering and Computational Technologies (ICIEECT), 5-7 April 2017.(IEEE)
  3. Pang B., Lee L., Vaithyanathan S., 'Thumbs up? Sentiment Classification using Machine Learning Techniques', Conference on Empirical methods in natural language processing, 2002
  4. Dave K., Lawrence S., Pennock D., 'Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews', In Proceedings of the 12th international conference on World Wide Web, 2003
  5. Kennedy A., Inkpen D., 'Sentiment classification of movie reviews using contextual valence shifters', Computational Intelligence, 2006
  6. Granitzer M., Kroll M., Seifert c., Rath A. S., 'Analysis of Machine Learning Techniques for Context Extraction' Digital Information Management, 2008
  7. Kawathekar S. A., Kshirsagar M., 'Sentiments analysis using Hybrid Approach involving Rule-Based and Support Vector Machines methods', Journal of Engineering, 2012
  8. Tan S., Zhang J., 'An empirical study of sentiment analysis for Chinese documents', Expert Systems with Applications, 2007
  9. Lyman, Peter; Hal R. Varian (2003). 'How Much Information”
  10. Megha Gupta, NaveenAggarwal ‘’ CLASSIFICATION TECHNIQUES ANALYSIS” NCCI 2010 -National Conference on Computational Instrumentation CSIO Chandigarh, INDIA, 19-20 March 2010.
  11. D. R. KumarRaja & S. PushpaMutlimed Tools Appl(2019).https://doi.org/10.1007/s11.42-019-7190- 7
  12. Raja K, Pushpa S. Novelty driven recommendation by using integrated matrix factorization and temporal aware clustering optimization. Int J CommunSyst. 2018; e3851. https://doi.org/10.1002/dac.3851.
  13. D. R. KumarRaja, and Dr. S. Pushpa, “Feature level review table generation for E-Commerce websites to produce qualitative rating of the products”, Future Computing and Informatics Journal(2017)118-124.
  14. D. R. KumarRaja, and Dr. S. Pushpa, “A Survey on Privacy Preserving Data Mining Techniques”, International Journal of Applied Engineering Research(IJAER), Vol 10(17) pp. 13142 - 13146, 2015, ICBDCC'15 St.Peter's University.
Updated: Feb 23, 2024
Cite this page

Feature Based Classification of Data Using Support Vector Machine. (2024, Feb 13). Retrieved from https://studymoose.com/document/feature-based-classification-of-data-using-support-vector-machine

Live chat  with support 24/7

👋 Hi! I’m your smart assistant Amy!

Don’t know where to start? Type your requirements and I’ll connect you to an academic expert within 3 minutes.

get help with your assignment