Bayesien classifier
The simplest solutions are the most powerful, and Naive Bayes is a good example. Despite the great advances in machine learning in recent years, it has proven to be not only simple but also fast, accurate and reliable. It has been successfully used with several fins, but it works very well with natural language processing problems.
Naive Bayes is a family of probabilistic algorithms that take advantage of probability theory and Bayes' theorem to predict the category of a sample. They are probabilistic, which means they calculate the probability of each category for a given sample, and then output the category with the highest probability. The way they obtain these probabilities is by using the Bayes theorem, which describes the probability of a feature, depending on prior knowledge of the conditions that could be related to this feature
We will work with an algorithm called Multinomial Naive Bayes. So at the end we will know how this method works, but also why it works. Then we will be able to do some advanced techniques that can make Naive Bayes competitive with more complex machine learning algorithms, such as SVM and neural networks..
example:
Text | Category |
“Barack Obama is the president of the U.S” | Politics |
“Football is my favorite game” | Sports |
“federer wimbledon is the best Tennis Player ” | Sports |
“i need to contact my bank to get a loan” | Financial world |
“Trump is the most stupid president” | Politics |
Now we need to transform the probability we want to calculate into something that can be calculated using word frequencies. That's why we will use some basic properties of probability and bayes theorem. To work with Bayes theorem, you need to know about conditional probability.
Conditional probability can be thought of as looking at the probability of one event occurring with some relationship to one or more other events. For example:
- Event A is that it is snowing outside, and it has a 0.3 (30%) chance of snowing today.
- Event B is that you will need to go outside, and that has a probability of 0.5 (50%).
We will just count how many times the sentence “Barack Obama is the president of the U.S” appears in the Politics category, divide it by the total, and obtain P(Barack Obama is the president of the U.S | Politics).
But this way, We could have a problem: “Barack Obama is the president of the U.S” could not appear in our training set, so the probability will be zero. Unless every sentence that we want to classify appears in our training set, the model would not be useful.
Naive Bayes
Here we can talk about Naive Bayes. Being naive will help us to get a better model to classify our sentence in the correct category. We suppose that every word in a sentence is independent from the others. That means that we will no longer look for the probability of the entire sentence, but we yill search the probability of individual words.
So "Barack Obama is the president of the U.S" is the same as "the president of the U.S is Barack Obama " as well as "Obama Barack is the president of the U.S" ..
Therefore, P(Barack Obama is the president of the U.S)= P(Barack)*P(Obama)*P(is)*P(the)*P(president)*P(of)*P(the)*P(U.S)
This hypothesis is very strong and super useful. This is why this model works well with small data set or data that can be mislabeled. The next step is exactly what we had before:
P(Barack Obama is the president of the U.S | Politics )= P(Barack | Politics)*P(Obama | Politics)*P( is | Politics )*P( the | Politics )*P( president | Politics )*P( of | Politics )*P( the | Politics )*P( U.S | Politics )
Now all of these individual words show up several times in our training set, and we can calculate them!
The final step is just to calculate every probability (for each category) and see which one turns out to be larger.