Nearest centroid classifier

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
Rocchio Classification

In machine learning, a nearest centroid classifier or nearest prototype classifier is a classification model that assigns to observations the label of the class of training samples whose mean (centroid) is closest to the observation. When applied to text classification using word vectors containing tf*idf weights to represent documents, the nearest centroid classifier is known as the Rocchio classifier because of its similarity to the Rocchio algorithm for relevance feedback.[1]

An extended version of the nearest centroid classifier has found applications in the medical domain, specifically classification of tumors.[2]

Algorithm

[edit | edit source]

Training

[edit | edit source]

Given labeled training samples {(x1,y1),,(xn,yn)} with class labels yi𝐘, compute the per-class centroids μ=1|C|iCxi where C is the set of indices of samples belonging to class 𝐘.

Prediction

[edit | edit source]

The class assigned to an observation x is y^=argmin𝐘μx.

See also

[edit | edit source]

References

[edit | edit source]
  1. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  2. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).