Machine learning

Naïve Bayesian classifier is a simple approach to classify samples into multiple classes. Assume that we have vector Y of classes and vector of features X, we can then use the Bayes’ formula to find conditional probabilities as follows:

$$P\left(y_j\middle|X\right)=\frac{P\left(y_j\right)}{P(x_1)...{P(x}_2)}\prod_{i=1}^{n}{P\left(x_i\middle| y_j\right)}$$

Calculating $${P(y}_j)$$ and $${P(x}_i|y_j)$$ is trivial: the first probability can be calculated as a fraction of time class $$y_j$$ appears in the dataset, and the second probability can be calculated as a fraction of time feature $$x_i$$ appears in the class $$y_j$$. The above formula assumes that the features appear independently, hence the name of the classifier – naïve. The classification is therefore done as follows (we can ignore the denominator since it does not depend on class and is effectively a constant, i.e., it scales all results in a similar way):

$$C=\ {argmax}_{\forall j\in k\ }P\left(y_j\right)\prod_{i=1}^{n}{P\left(x_i\middle| y_j\right)}$$

To ease the computation, we can take the logarithm of the product to make it the sum. Moreover, for continuous data, the conditional probabilities can be assumed to come from normal distribution. In this case, parameters \mu and \sigma are estimated from the empirical data, and the conditional probabilities are drawn from the Gaussian distribution using the estimated parameters.