Using Bayes Theorem for Classification in Machine Learning

Categories: ScienceTechnology

Introduction

Bayesian learning methods are applicable to our find out about machine learning for two extraordinary reasons. First, Bayesian learning algorithms that calculate explicitly probabilities for hypotheses, such as the naive Bayes classifier, are amongst the most practical methods to certain kinds of learning problems.

Definition of Bayes Rule

Bayesian statistics is a series of tools that are used in a distinctive structure of statistical inference which applies in the evaluation of experimental records in many sensible conditions in science and engineering.

Bayes' rule is one of the most essential rules in probability theory.

Concept of Bayes Rule

The classical techniques of estimation that we have studied in this textual content are based solely on the data supplied by the random sample. These techniques essentially interpret probabilities as relative frequencies. For example, in arriving at a 95% the confidence interval for μ, we interpret the statement to imply that 95% of the time in repeated experiments Z will fall between −1.96 and 1.96. Since for a regular sample with regarded variance, the probability announcement here means that 95% of the random intervals ( ¯X − 1,96σ / √ n, ¯X + 1,96σ / √n) consists of the true mean μ.

Get quality help now
Writer Lyla
Writer Lyla
checked Verified writer

Proficient in: Science

star star star star 5 (876)

“ Have been using her for a while and please believe when I tell you, she never fail. Thanks Writer Lyla you are indeed awesome ”

avatar avatar avatar
+84 relevant experts are online
Hire writer

Another approach to statistical techniques of estimation is called Bayesian methodology. The primary concept of the method comes from Bayes' rule, described in Definition. The key difference between the Bayesian method and the classical or frequentist method is that in Bayesian concepts, the parameters are seen as random variables.

Mathematical Formulation of Bayes Theorem

P(AB)=P(B)P(BA)P(A)

Where:

P ( A | B ) the probability of event A occurring, given event B has occurred.

Get to Know The Price Estimate For Your Paper
Topic
Number of pages
Email Invalid email

By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email

"You must agree to out terms of services and privacy policy"
Write my paper

You won’t be charged yet!

P ( B | A ) the probability of event B occurring, given event A has occurred.

P ( A ) the probability of event A.

P ( B ) the probability of event B.

Note that events A and B are independent events (the probability of the outcome of event A does not depend on the probability of the outcome of event B). There is a special case of the Bayes’ theorem is when event A is a binary variable expressed in the following way:

Where:

P ( B | A– ) the probability of event B occurring given that event A– has occurred

P ( B | A+ ) the probability of event B occurring given that event A+ has occurred

Note that In the special case above, events A– and A+ are mutually exclusive outcomes of event A.

Example

In a particular pain clinic, 10% of patients are prescribed narcotic pain killers. Overall, five percent of the clinic’s patients are addicted to narcotics ( including pain killers and illegal substances ). Out of all the people prescribed pain pills, 8% are addicts. If a patient is an addict, what is the probability that they will be prescribed pain pills?

Solution

Step 1: Figure out what your event “A” is from the question. That information is in the italicized part of this particular question. The event that happens first ( A ) is being prescribed pain pills. That’s given as 10%.

Step 2: Figure out what your event “B” is from the question. That information is also in the italicized part of this particular question. Event B is being an addict. That’s given as 5%.

Step 3: Figure out what the probability of event B ( Step 2 ) given event A ( Step 1 ). In other words, find what (B|A) is. We want to know “Given that people are prescribed pain pills, what’s the probability they are an addict?” That is given in the question as 8%, or 0.08.

Step 4: Insert your answers from Steps 1, 2 and 3 into the formula and solve.

P( A | B ) = P( B | A ) * P( A ) / P( B ) = ( 0.08 * 0.1 ) / 0.05 = 0.16

The probability of an addict being prescribed pain pills is 0.16 ( 16% ).

Bayes Theorem for Modeling Hypotheses

Bayes Theorem is a beneficial tool for utilized machine learning. It affords a way of wondering about the relationship between information and a model. A machine learning algorithm or model is a particular way of wondering about the structured relationships in the data. In this way, a model can be an idea of as a hypothesis about the relationships in the data, such as the relationship between input (X) and output (y). The exercise of utilized machine learning is the checking out and analysis of different hypotheses (models) on a given dataset.

Bayes Theorem affords a probabilistic model to describe the relationship between information (D) and a hypothesis (h); for example:

P(h|D) = P(D|h) * P(h) / P(D)

Breaking this down, it says that the probability of a given hypothesis retaining or being true given some determined information can be calculated as the probability of observing the information given the hypothesis multiplied by the probability of the hypothesis being true regardless of the data, divided by the probability of observing the information regardless of the hypothesis.

Bayes theorem affords a way to calculate the probability of a hypothesis based totally on its prior probability, the probabilities of observing a variety of information given the hypothesis, and the determined information itself.

Under this framework, every piece of the calculation has a particular name; for example:

  • P ( h | D ) : Posterior probability of the hypothesis (the factor we want to calculate).
  • P ( h ) : Prior probability of the hypothesis.

This offers a beneficial framework for wondering about and modelling a machine learning problem.

If we have some prior area knowledge about the hypothesis, this is captured in the prior probability. If we don’t, then all hypotheses can also have the identical prior probability.

If the probability of observing the information P( D ) increases, then the probability of the hypothesis retaining given the information P( h | D ) decreases. Conversely, if the probability of the hypothesis P( h ) and the probability of observing the information given hypothesis increases, the probability of the hypothesis retaining given the information P( h | D ) increases.

The notion of trying out distinctive models on a dataset in utilized machine learning can be the notion of as estimating the probability of every hypothesis ( h1, h2, h3, … in H ) being true given the determined data. The optimization or searching for the hypothesis with the most posterior probability in modelling is known as maximum a posteriori or MAP for short. Any such maximally probable hypothesis is known as a most a posteriori ( MAP ) hypothesis. We can decide the MAP hypotheses via using Bayes theorem to calculate the posterior probability of every candidate hypothesis.

Under this framework, the probability of the information ( D ) is constant as it is used in the evaluation of every hypothesis. Therefore, it can be eliminated from the calculation to provide the simplified unnormalized estimate as follows:

max h in HP( h | D ) = P( D | h ) * P( h )

If we do no longer have any prior data about the hypothesis being tested, they can be assigned a uniform probability, and this term too will be a constant and can be eliminated from the calculation to supply the following:

max h in HP( h | D ) = P( D | h )

That is, the aim is to locate a hypothesis that excellent explains the observed data. Fitting models like linear regression for predicting a numerical value and logistic regression for binary classification can be framed and solved below the MAP probabilistic framework. This gives a choice to the greater frequent most probability estimation ( MLE ) framework.

Bayes Theorem for Classification

Classification is predictive modelling trouble that entails assigning a label to a given input information sample. The trouble of classification predictive modelling can be framed as calculating the conditional probability of a class label given an information sample, for example:

P( class | data ) = ( P( data | class ) * P( class ) ) / P( data )

Where P( class | data ) is the probability of type given the supplied data.

This calculation can be carried out for each class in the trouble and the type that is assigned the greatest probability can be selected and assigned to the input data.

In practice, it is very difficult to calculate full Bayes Theorem for classification. The priors for the class and the information are convenient to estimate from a training dataset if the dataset is suitability consultant of the broader problem.

The conditional probability of the remark based on the class P( data | class ) is now not possible unless the quantity of examples is incredibly large, e.g. massive sufficient to efficaciously estimate the chance distribution for all different feasible combinations of values. This is nearly never the case, we will not have adequate coverage of the domain.

As such, the direct application of Bayes Theorem also turns into intractable, particularly as the number of variables or facets ( n ) increases.

Naive Bayes Classifier

The solution to the usage of Bayes Theorem for a conditional probability classification model is to simplify the calculation. The Bayes Theorem assumes that every input variable is based upon all different variables. This is a reason for the complexity in the calculation. We can do away with this assumption and consider every input variable as being unbiased of each other. This adjustment the model from a based conditional probability model to an independent conditional probability model and dramatically simplifies the calculation. This means that we calculate P( data | class ) for each input variable one by one and multiple the outcomes together, for example: P( class | X1, X2, …, Xn ) = P( X1 | class ) * P( X2 | class ) * … * P( Xn | class ) * P( class ) / P( data )

We can additionally drop the probability of observing the information as it is a constant for all calculations, for example:

P( class | X1, X2, …, Xn ) = P( X1 | class ) * P( X2 | class ) * … * P( Xn | class ) * P( class )

This simplification of Bayes Theorem is common and broadly used for classification predictive modelling problems and is generally referred to as Naive Bayes.

Conclusion

Bayesian methods, grounded in Bayes' Rule and its adaptations like the Naive Bayes Classifier, offer robust frameworks for addressing complex learning problems in machine learning. By effectively handling uncertainty and incorporating prior knowledge into model development, these methods enhance the ability to make informed predictions, thereby contributing significantly to the advancement of machine learning and its applications in various fields.

Updated: Feb 16, 2024
Cite this page

Using Bayes Theorem for Classification in Machine Learning. (2024, Feb 16). Retrieved from https://studymoose.com/document/using-bayes-theorem-for-classification-in-machine-learning

Live chat  with support 24/7

👋 Hi! I’m your smart assistant Amy!

Don’t know where to start? Type your requirements and I’ll connect you to an academic expert within 3 minutes.

get help with your assignment