The Naive Bayes algorithm is a classification algorithm based on Bayes rule and a set of conditional independence assumptions. It predicts a class value given a set of set of attributes.
Bayes theorem provides a way of calculating the posterior probability, P(c|x), from class prior probability (P(c)), predictor prior probability (P(x)), and likelihood (P(x|c)). Naive Bayes classifier assume that the effect of the value of a predictor (x) on a given class (c) is independent of the values of other predictors. This assumption is called class conditional independence.
Native Bayes can be applied in text classification problems such as spam detection, sentiment analysis and categorization.
✓ The example of conditional probabilities and Bayes theorem:
35 emails out of a total of 74 are spam messages.
50 emails out of those 74 contain the word “xxx”.
25 emails containing the word “xxx” have been marked as spam.
What is the probability that the latest received email is a spam message, given that it contains the word “xxx”?
P(spam/xxx) = P(xxx/spam)*P(spam)/P(xxx)
P(spam/xxx) = ((25/35)*(35/74))/(50/74) = 0.50
✓ The example of naive Bayes approach:
28 emails out of the total contain the word “viagra”.
26 emails out of those have been marked as spam.
What is the probability that an email is spam, given that it contains both “viagra” and “xxx”?
P(spam/xxx,viagra) = (P(xxx/spam)*P(viagra/spam)*P(spam))/(P(xxx)*P(viagra))
P(spam/xxx,viagra) = ((25/35)*(26/35)*(35/74))/((50/74)*(28/74)) = 0.98
✓ Native Bayes for continuous variable
Conditional probability modeled with the normal distribution