24/7 writing help on your phone
Save to my list
Remove from my list
This chapter describes the key concepts and research methodologies that are used in the extraction of services delivery sentiment on social media. It explain the overall process of methodizes used to extract comments from social media, data pre-processing, classification of those data and the last process is testing and training of data, as well as the method that used to visualize the model. Also this chapter is going to answer all our research questions in a good manner.
Methodology is the word which implies more than simply the methods you intend to use to collect data.
It also takes a consideration on different concepts which underlie the methods. The data collection was conducted through literature review where study different materials on how to extract data from social media especially instagram and Facebook where ikulu mawasiliano instagram account and Zitto Kabwe instagram and Facebook page used to collect data.
Literature review is a research tool which enables evaluator to make the best use of the previous work in the field under investigation.
These help the researcher to learn from experiences, findings and mistakes of the previous related work (Goode 1952). The literature review
In this project, data collection is done on Instagram account of ikulu Mawasiliano (kurugenzi ya mawasiliano ya Rais Ikulu) which is the official account of the government which used to produce different announcements and publication, official Facebook page and instagram page of zitto Kabwe who is a member of Tanzania parliament representing Kigoma. In the process of data collection different craping tools used to help the mining of data from the specific social page, those tools include chrome scraping extensions tool which is added as extension on chrome browser.
This scrapping tools help to mine all the comments from a specific posts.
Other method used in the mining of data is Octourse and parsehub. Instagram and Facebook especially the specified social media account (ikulu mawasiliano and Zitto Kabwe) was selected for data collection because by investigation don on social media, those account revealed used to post direct social services delivery related posts, and some of people use to comment on the post provided by sharing their opinions and their views on social media. Also are the page which are frequently posting means are active. The second reason is just area of specification, because there are many accounts and the research cannot use all the account to extract data.
Is the stage where un required data are being removed, for instance additional information like emoji, links, and other unrequired character which are included by people when they share their opinions. In order to prepare the data collected for machine learning tasks, the text pre-processing including stop word removal, tokenization, lemmatization, and stemming, feature engineering. Instance selection also cope with the infeasibility of learning from a large datasets (Kotsiantis, 2007), and it attempt to maintain the quality of mining with minimum sample size
For the non- English language such as Arabic language is highly derivative of tens or even hundreds of words that could be formed using only by one stem. Due to that one stem may form many other words. According to the Ahmed A Elbery working with the Arabic document without stemming may result to the enormous that number of words being input into the classification phase.
Tokenization it refers to the process of split text or words into unit that called tokens, and the process called tokenization. In tokenization text is being read, tokenizing it into tokens or words generally it take place through by either blank space or any other character.
Another step performed in this research work is removing of all Arabic Words that have little meaning that are occur frequency to the documents such as "or"," whose", "on", "where", "in", "from", "beyond", "from" and "all". Process of removing stops word result to the effective processing and ensure efficiency of the terms indexing procedure.
This is a process modifying the existing data features into the new features that will be used to train a machine learning model. This process is important because the machine learning algorithm learn from the given data.
The process of evaluating the model is done by using confusion matrix, this is done after data cleaning and preprocessing. Confusion matrix is the measurement of performance of machine learning classification problem and the output can be of two or more classes. This is a table which includes the combinations of actual values and predicted one
The confusion matrix is used to measure the accuracy of the model from a given dataset . accuracy of the model means the collectness of a classifier by using predicted value and the actual datasets.
This research will use the following classification techniques
Na?ve Bayes this is one of supervised machine learning algorithms which applies Bayesian theorem with the assumption of independence between every pair of features.
P(Y|X)= (P(X?Y)*P(Y))/(P(X))
But in real life problems, there are multiple X variables as shown below.
X=(x_1,x_2,x_3,··,x_n)
P(Y|x_1,x_2,x_3,··,x_n)= (P(?(x?_1,x_2,x_3,··,x_n)?Y)*P(Y))/(P(x_1,x_2,x_3,··,x_n))
It require the predicator to be independent, while the predicators are dependent in many real life cases, this can limit the performance of the classifier
Is the supervised machine learning which use hyper plane in a dimension space that classifies the data point?
Research Methodologies in Services Delivery on Social Media. (2019, Nov 22). Retrieved from https://studymoose.com/research-methodologies-in-services-delivery-on-social-media-essay
👋 Hi! I’m your smart assistant Amy!
Don’t know where to start? Type your requirements and I’ll connect you to an academic expert within 3 minutes.
get help with your assignment