Utilizing Machine Learning for Heart Disease Prediction

Abstract

Healthcare is an unavoidable assignment to be done in human life. Cardiovascular disease is a general classification for a scope of infections that are influencing heart and veins. The early strategies for estimating the cardiovascular infections helped in settling on choices about the progressions to have happened in high-hazard patients which brought about the decrease of their dangers. The human services industry contains bunches of clinical information, subsequently Machine Learning calculations are required to settle on choices viably in the forecast of heart illnesses.

Late research has dove into joining these methods to give cross breed Machine Learning calculations. Our undertaking proposes an expectation model to foresee whether the individuals have a coronary illness or not and furthermore it figures the opportunity of hazard level of the individual to get influenced via cardiovascular ailments and play it safe to dispose of it.

Introduction

Cardiovascular diseases are one of the biggest reason for deaths. They are normally alluded as ailments that influence the capacity of veins and heart.

Get quality help now
Dr. Karlyna PhD
Dr. Karlyna PhD
checked Verified writer
star star star star 4.7 (235)

“ Amazing writer! I am really satisfied with her work. An excellent price as well. ”

avatar avatar avatar
+84 relevant experts are online
Hire writer

As indicated by World Health Organization (WHO), consistently an expected 17.9 million which is about 31% of deaths happened because of cardiovascular sicknesses everywhere throughout the world [1]. A number that is relied upon to develop more than 23.6 million by 2030.WHO association focus to diminish unexpected losses from non-transmittable infections of which CVD's cosmetics biggest extent.

This makes coronary illness a significant worry to manage. In any case, it is hard to recognize infections in people due to a few contributory hazard factors like circulatory strain, cholesterol and numerous different components.

Get to Know The Price Estimate For Your Paper
Topic
Number of pages
Email Invalid email

By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email

"You must agree to out terms of services and privacy policy"
Write my paper

You won’t be charged yet!

Because of such imperatives, researchers proposed most recent methodologies like data mining and machine learning ways to deal with foresee the infection in people.

Machine Learning ends up being viable in helping to make expectations from the enormous amount of information delivered by the health care industry. Data mining provides many supervised classification algorithms such as Logistic Regression, Decision trees, SVM, KNN, Random forests in which the presentation of the model can be improved by using class Imbalance strategies.

Literature Survey

Data mining is broadly utilized in the clinical field like forecast of coronary illness since it is a multidisciplinary field. Utilizing data mining analysts are creating different procedures so as to anticipate the heart diseases with highest accuracy. Huge number. of research work is done for clinical determination for different maladies.

Forecast of coronary illness utilizing k-nearest neighbor and particle swarm improvement was presented by Jabber MA[2]. Feature subset determination is utilized to take care of this issue. Feature selection improved precision and diminished the running time. Before feature subset selection accuracy got is 75%. PSO search channels the quantity of highlights and chooses the highlights which contribute more to the grouping. By applying KNN with PSO accuracy improved to 100%.

Adaptive weighted fuzzy rule-based framework for the hazard level appraisal of coronary illness was proposed by Animesh Kumar Paul Pintu Chandra Shill Md. Rafiqul Islam Rabin Kazuyuki Murase[3]. Right now, fluffy framework is progressed for the forecast of coronary illness' hazard levels utilizing GAs, adjusted DMSPSO and troupe procedure. This model chooses the basic properties which can help the coronary illness determination. Compelling properties are chosen through factual strategies, for example, Correlation coefficient, R-Squared and Weighted Least Squared (WLS) strategy, iii) Weighted fuzzy standards are framed based on chosen qualities.

Prediction of risk score for coronary illness utilizing associative classification and hybrid feature subset selection was obtained by Jabbar Akhil[4] utilized Feature choice as a pre-handling step in used to decrease dimensionality, expelling superfluous information and expanding precision and improves intelligibility. Affiliated order is an ongoing and compensating procedure that applies the system of relationship into grouping and accomplishes high arrangement precision. ost associative classification algorithms adopt exhaustive search algorithms like in Apriori, and create enormous no. of rules from which a lot of high caliber of rules are picked to develop productive classifier.

An Integrated Decision Support System Based on ANN and Fuzzy_AHP for Heart Failure Risk Prediction by Oluwarotimi Williams Samuela, Grace Mojisola Asogbona, Arun Kumar Sangaiahc, Peng Fanga and Guanglin Lia [5]. Fluffy systematic progression process (Fuzzy_AHP) procedure was utilized to figure the worldwide loads for the attributes dependent on their individual commitment with an accuracy of 91.10%, which is 4.40% higher in contrast with that of the customary ANN technique.

Ali. Adeli et al., have presented a specialist fuzzy model for the finding of coronary illness, the proposed framework was broke down on the V.A.medical focus, Long sea shore and Cleveland facility establishment database ('UCI Machine Learning Repository: Heart Disease Data Set,' n.d.). Mamdani inference method is utilized in this model [6] to design the fuzzy expert system (FES) through membership function, fuzzy rule base, fuzzification and defuzzification with 13 inputs and 1 output.

Robert Detrano [7] accomplished 77.00% precision on Cleveland coronary illness informational index utilizing calculated relapse calculation. Newton Cheung [8] applied C4.5, Naive Bayes, BNND and BNNF calculations and got 81.11 %, 81.45%, 81.11%, and 80.96% precision, separately on Cleveland informational index. WEKA and RA got 83.60% accuracy utilizing Naive-Bayes calculation [9].

Dataset Description

Table 1: Dataset Description (Cleveland Dataset from UCI Machine Learning Repository)

Sl. No. Attribute Attribute Full Form Description
1 age Age Represents the age of an individual.
2 sex Sex Represents the sex of an individual.
3 cp Chest Pain Type Type of chest pain experienced: 1=typical angina, 2=atypical angina, 3=non-anginal pain, 4=asymptomatic.
4 trestbps Resting Blood Pressure Resting blood pressure in mmHg.
5 chol Serum Cholesterol Serum cholesterol in mg/dl.
6 fbs Fasting Blood Sugar Compares fasting blood sugar value to 120mg/dl.
7 restecg Resting ECG Resting electrocardiographic results.
8 thalach Max Heart Rate Achieved Max heart rate achieved by an individual.
9 exang Exercise Induced Angina Indicates if exercise induced angina: 1=yes, 0=no.
10 oldpeak ST Depression ST depression induced by exercise relative to rest.
11 slope Peak Exercise ST Segment Peak exercise ST segment.
12 ca Number of Major Vessels Number of major vessels colored by fluoroscopy.
13 thal Thalassemia Thalassemia: 3=normal, 6=fixed defect, 7=reversible defect.
14 num Diagnosis of Heart Disease Represents presence or absence of heart disease.

Table 2: Machine Learning Algorithms Used

Algorithm Description Application
Decision Trees Utilizes simple decision rules derived from data features for classification and regression. Heart disease prediction
Naive Bayes (NB) A probabilistic classifier based on Bayes' theorem, effectively handling large datasets. Heart disease prediction
Random Forest Constructs multiple decision trees for improved prediction stability and accuracy. Heart disease prediction
KNN Classifies data based on the closest training examples in the feature space. Heart disease prediction
Logistic Regression Estimates probabilities using a logistic function, widely used for binary classification problems. Heart disease prediction

Table 3: Algorithm Performance Comparison

Algorithm Accuracy Before Techniques Accuracy After Correlation Accuracy After PCA Accuracy After SMOTE Technique
Decision Trees N/A N/A N/A 75.71%
Random Forests N/A N/A N/A 90%
Naive Bayes N/A Best without correlation N/A 90%
KNN N/A N/A N/A 82.86%
Logistic Regression N/A N/A Best result 94.29%

From our proposed model, first we recognize the risk rate and level of risk.If the degree of risk is high, at that point the individual get their test reports from test centers and enter expected outcomes to the model. At that point the model predicts whether the individual has disease or not.

Research on data mining has prompted the detailing of a several data mining calculations. These calculations can be legitimately utilized on a dataset for making a few models or to reach fundamental determinations and derivations from that dataset. Some famous information mining calculations are Decision tree, Naïve Bayes, k-means, artificial neural network system and so forth.

1. Decision Tree: Decision Tree Analysis is a general, prescient demonstrating device that has applications crossing various regions. All in all, decision trees are built by means of an algorithmic methodology that distinguishes approaches to part an informational collection dependent on various conditions. It is one of the most broadly utilized and useful techniques for directed learning. Decision Trees are a non-parametric managed learning technique utilized for both order and relapse assignments. The objective is to make a model that predicts the estimation of an objective variable by taking in basic choice principles gathered from the information features. The decision rules are for the most part in type of on the off chance that else articulations. The more profound the tree, the more perplexing the standards and fitter the model.

2. Naive Bayes (NB): It is a basic procedure for developing classifiers. It is a probabilistic classifier dependent on Bayes' hypothesis. All Naive Bayes classifiers expect that the estimation of a specific component is autonomous of the estimation of some other element, given the class variable. Bayes hypothesis is given as follows: P(C|X) = P(X|C) * P(C)/P(X), where X is the information tuple and C is the class with the end goal that P(X) is consistent for all classes. In spite of the fact that it expect an unreasonable condition that characteristic qualities are restrictively free, it performs shockingly well on enormous datasets where this condition is accepted and holds.

3. Random Forest: Random Forests are an outfit learning strategy (likewise thought of as a type of closest neighbor indicator) for order and relapse procedures. It constructs numerous choice trees and afterward consolidates them so as to get increasingly precise and stable expectations. It develops various Decision trees at preparing time and yields the class that is the method of the classes yield by singular trees. It likewise attempts to limit the issues of high difference and high predisposition by averaging to locate a characteristic harmony between the two boundaries. Both R and Python have strong bundles to execute this calculation.

4. KNN: KNN algorithm is one of the simplest classification algorithms and it is one of the most utilized learning calculations. KNN is a non-parametric, languid learning calculation. Its motivation is to utilize a dataset in which the information focuses are isolated into a few classes to anticipate the arrangement of another example point. A KNN calculation utilizes an information and groups new information focuses dependent on a closeness measures (for example separation work, blunder rate). Order is finished by a dominant part vote to its neighbors. The information is doled out to the class which has the most closest neighbors. As we increment the quantity of closest neighbors, the estimation of k, accuracy may increment.

At the point when we state a procedure is non-parametric, it implies that it doesn't make any suppositions on the hidden information appropriation. As it were, the model structure is resolved from the information. All things being equal, it's quite valuable, on the grounds that in 'this present reality', the greater part of the information doesn't comply with the run of the mill hypothetical presumptions made (as in straight relapse models, for instance). Consequently, KNN could and most likely ought to be one of the main decisions for an arrangement study when there is practically no earlier information about the dissemination information.

5. Logistic Regression: Logistic regression is one of the most well known Machine Learning calculations, which goes under the Supervised Learning strategy. It is utilized for anticipating the straight out ward variable utilizing a given arrangement of free variables.Logisticregression predicts the yield of a clear cut ward variable. Subsequently the result must be an absolute or discrete worth. It tends to be either Yes or No, 0 or 1, genuine or False, and so on however as opposed to giving the specific incentive as 0 and 1, it gives the probabilistic qualities which lie somewhere in the range of 0 and 1.Logistic Regression is a lot of like the Linear Regression aside from that how they are utilized. Linear Regression is utilized for tackling Regression issues, while Logistic regression is utilized for taking care of the arrangement issues.

SMOTE Technique: SMOTE (synthetic minority oversampling technique) is one of the most commonly used oversampling methods to solve the imbalance problem.

It aims to balance class distribution by randomly increasing minority class examples by replicating them. SMOTE synthesises new minority instances between existing minority instances. It generates the virtual training records by linear interpolation for the minority class. These synthetic training records are generated by randomly selecting one or more of the k-nearest neighbors for each example in the minority class. After the oversampling process, the data is reconstructed and several classification models can be applied for the processed data.

Conclusion

We have utilized 5 algorithms like Decision Trees, Random Forests, Naive Bayes, KNN and Logistic Regression so as to anticipate presence or absence of of coronary illness utilizing SMOTE Technique. The accuracy fluctuates for various calculations. The accuracy for Decision tree calculation is 75.71. The accuracy for Random Forest calculation is 90. The accuracy for Naive Bayes calculation is 90. The accuracy for KNN calculation is 82.86. The highest accuracy is given when we have utilized Logistic Regression calculation utilizing SMOTE Technique which is almost 94.29%.

References

  1. https://www.who.int/en/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)
  2. Jabbar MA, “ Prediction of heart disease using k nearest neighbour and particle swarm optimization”, in Biomed Res- India 2017 Volume 28 Issue 9, pp:4154-4158.
  3. Paul, Animesh Kumar & Chandra Shill, Pintu & Rabin, Md. Rafiqul Islam & Murase, Kazuyuki.,” Adaptive weighted fuzzy rule-based system for the risk level assessment of heart disease”, in Applied Intelligence.2017, 48. DOI:10.1007/s10489-017-1037-6.
  4. Jabbar MA, Deekshatulu BL,Priti C, “Prediction of risk score for heart disease using associative classification and hybrid feature Selection”, in IEEE ISDA 2012, pp:628-634.
  5. Oluwarotimi Williams Samuel , Grace Mojisola Asogbon , Arun Kumar Sangaiah , Fang Peng , Guanglin Li , “An Integrated Decision Support System Basedon ANN and Fuzzy_AHP for Heart Failure Risk Prediction” , in Expert Systems With Applications (2016), DOI: 10.1016/j.eswa.2016.10.020.
  6. Adeli A, Neshat M, “A fuzzy expert system for heart disease diagnosis”, In: Proceedings of the international multi-conference of engineers and computer scientists, vol I, 2010, pp: 1–6.
  7. Detrano R, Janosi A, Steinbrunn W, Pfisterer M, Schmid JJ, Sandhu S, Guppy KH, Lee S, Froelicher V , “International application of a new probability algorithm for the diagnosis of coronary artery disease”, in Am J Cardio, 64(5), 1989, pp:304–310, DOI:10.1016/0002-9149(89)90524-9 .
  8. Cheung N, “ Machine learning techniques for medical analysis.School of Information Technology and Electrical Engineering”, BSc Thesis, University of Queenland
  9. Das R, Turkoglu I, Sengur A, “Effective diagnosis of heart disease through neural networks ensembles”, in Expert Syst Appl, 2009, 36(4), pp:7675–7680, DOI:10.1016/j.eswa.2008.09.013
Updated: Feb 18, 2024
Cite this page

Utilizing Machine Learning for Heart Disease Prediction. (2024, Feb 18). Retrieved from https://studymoose.com/document/utilizing-machine-learning-for-heart-disease-prediction

Live chat  with support 24/7

👋 Hi! I’m your smart assistant Amy!

Don’t know where to start? Type your requirements and I’ll connect you to an academic expert within 3 minutes.

get help with your assignment