Hinge loss

From Wikipedia, the free encyclopedia
(Redirected from Hinge function)
Jump to navigation Jump to search
File:Hinge loss vs zero one loss.svg
The vertical axis represents the value of the Hinge loss (in blue) and zero-one loss (in green) for fixed t = 1, while the horizontal axis represents the value of the prediction y. The plot shows that the Hinge loss penalizes predictions y < 1, corresponding to the notion of a margin in a support vector machine.

In machine learning, the hinge loss is a loss function used for training classifiers. The hinge loss is used for "maximum-margin" classification, most notably for support vector machines (SVMs).[1]

For an intended output t = Β±1 and a classifier score y, the hinge loss of the prediction y is defined as

β„“(y)=max(0,1βˆ’tβ‹…y)

Note that y should be the "raw" output of the classifier's decision function, not the predicted class label. For instance, in linear SVMs, y=𝐰⋅𝐱+b, where (𝐰,b) are the parameters of the hyperplane and 𝐱 is the input variable(s).

When t and y have the same sign (meaning y predicts the right class) and |y|β‰₯1, the hinge loss β„“(y)=0. When they have opposite signs, β„“(y) increases linearly with y, and similarly if |y|<1, even if it has the same sign (correct prediction, but not by enough margin).

The Hinge loss is not a proper scoring rule.

Extensions

[edit | edit source]

While binary SVMs are commonly extended to multiclass classification in a one-vs.-all or one-vs.-one fashion,[2] it is also possible to extend the hinge loss itself for such an end. Several different variations of multiclass hinge loss have been proposed.[3] For example, Crammer and Singer[4] defined it for a linear classifier as[5]

β„“(y)=max(0,1+maxyβ‰ t𝐰yπ±βˆ’π°t𝐱),

where t is the target label, 𝐰t and 𝐰y are the model parameters.

Weston and Watkins provided a similar definition, but with a sum rather than a max:[6][3]

β„“(y)=βˆ‘yβ‰ tmax(0,1+𝐰yπ±βˆ’π°t𝐱).

In structured prediction, the hinge loss can be further extended to structured output spaces. Structured SVMs with margin rescaling use the following variant, where w denotes the SVM's parameters, y the SVM's predictions, Ο† the joint feature function, and Ξ” the Hamming loss:

β„“(𝐲)=max(0,Ξ”(𝐲,𝐭)+⟨𝐰,Ο•(𝐱,𝐲)βŸ©βˆ’βŸ¨π°,Ο•(𝐱,𝐭)⟩)=max(0,maxyβˆˆπ’΄(Ξ”(𝐲,𝐭)+⟨𝐰,Ο•(𝐱,𝐲)⟩)βˆ’βŸ¨π°,Ο•(𝐱,𝐭)⟩).

Optimization

[edit | edit source]

The hinge loss is a convex function, so many of the usual convex optimizers used in machine learning can work with it. It is not differentiable, but has a subgradient with respect to model parameters w of a linear SVM with score function y=𝐰⋅𝐱 that is given by

βˆ‚β„“βˆ‚wi={βˆ’tβ‹…xiif tβ‹…y<1,0otherwise.
File:Hinge loss variants.svg
Plot of three variants of the hinge loss as a function of z = ty: the "ordinary" variant (blue), its square (green), and the piece-wise smooth version by Rennie and Srebro (red). The y-axis is the l(y) hinge loss, and the x-axis is the parameter t

However, since the derivative of the hinge loss at ty=1 is undefined, smoothed versions may be preferred for optimization, such as Rennie and Srebro's[7]

β„“(y)={12βˆ’tyifty≀0,12(1βˆ’ty)2if0<ty<1,0if1≀ty

or the quadratically smoothed

β„“Ξ³(y)={12Ξ³max(0,1βˆ’ty)2iftyβ‰₯1βˆ’Ξ³,1βˆ’Ξ³2βˆ’tyotherwise

suggested by Zhang.[8] The modified Huber loss L is a special case of this loss function with Ξ³=2, specifically L(t,y)=4β„“2(y).

See also

[edit | edit source]

References

[edit | edit source]
  1. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  2. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  3. ^ a b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  4. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  5. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  6. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  7. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  8. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).