Bayes classifier
| Part of a series on |
| Bayesian statistics |
|---|
| File:Bayes icon.svg |
| Posterior = Likelihood × Prior ÷ Evidence |
| Background |
| Model building |
| Posterior approximation |
| Estimators |
| Evidence approximation |
| Model evaluation |
|
In statistical classification, the Bayes classifier is the classifier having the smallest probability of misclassification of all classes using the same set of features.[1]
Definition
[edit | edit source]Suppose a pair takes values in , where is the class label of an element whose features are given by . Assume that the conditional distribution of X, given that the label Y takes the value r is given by where "" means "is distributed as", and where denotes a probability distribution.
A classifier is a rule that assigns to an observation X=x a guess or estimate of what the unobserved label Y=r actually was. In theoretical terms, a classifier is a measurable function , with the interpretation that C classifies the point x to the class C(x). The probability of misclassification, or risk, of a classifier C is defined as
The Bayes classifier is
In practice, as in most of statistics, the difficulties and subtleties are associated with modeling the probability distributions effectively—in this case, . The Bayes classifier is a useful benchmark in statistical classification.
The excess risk of a general classifier (possibly depending on some training data) is defined as Thus this non-negative quantity is important for assessing the performance of different classification techniques. A classifier is said to be consistent if the excess risk converges to zero as the size of the training data set tends to infinity.[2]
Considering the components of to be mutually independent, we get the naive Bayes classifier, where
Properties
[edit | edit source]Proof that the Bayes classifier is optimal and Bayes error rate is minimal proceeds as follows.
Define the variables: Risk , Bayes risk , all possible classes to which the points can be classified . Let the posterior probability of a point belonging to class 1 be . Define the classifier as
Then we have the following results:
- , i.e. is a Bayes classifier,
- For any classifier , the excess risk satisfies
Proof of (a): For any classifier , we have where the second line was derived through Fubini's theorem
Notice that is minimised by taking ,
Therefore the minimum possible risk is the Bayes risk, .
Proof of (b):
Proof of (c):
Proof of (d):
General case
[edit | edit source]The general case that the Bayes classifier minimises classification error when each element can belong to either of n categories proceeds by towering expectations as follows.
This is minimised by simultaneously minimizing all the terms of the expectation using the classifier for each observation x.