Simple Bayes classifier
Naïve Bayesian classifier is a simple approach to classify samples into multiple classes.
Assume that we have vector Y of classes and vector of features X, we can then
use the Bayes’ formula to find conditional probabilities as follows:
$$P\left(y_j\middle|X\right)=\frac{P\left(y_j\right)}{P(x_1)...{P(x}_2)}\prod_{i=1}^{n}{P\left(x_i\middle| y_j\right)}$$
Calculating $${P(y}_j)$$ and $${P(x}_i|y_j)$$ is trivial:
the first probability can be calculated as a fraction of time
class $$y_j$$ appears in the dataset, and the second probability
can be calculated as a fraction of time feature $$x_i$$
appears in the class $$y_j$$. The above formula assumes that the features appear independently, hence the name of the
classifier – naïve. The classification is therefore done as follows (we can ignore the denominator since it does
not depend on class and is effectively a constant, i.e., it scales all results in a similar way):
$$C=\ {argmax}_{\forall j\in k\ }P\left(y_j\right)\prod_{i=1}^{n}{P\left(x_i\middle| y_j\right)}$$
To ease the computation, we can take the logarithm of the product to make it the sum. Moreover,
for continuous data, the conditional probabilities can be assumed to come from normal distribution.
In this case, parameters \mu and \sigma are
estimated from the empirical data, and the conditional probabilities are drawn from the Gaussian distribution
using the estimated parameters.