Huber loss

From Wikipedia, the free encyclopedia
(Redirected from Huber loss function)
Jump to navigation Jump to search

In statistics, the Huber loss is a loss function used in robust regression, that is less sensitive to outliers in data than the squared error loss. A variant for classification is also sometimes used.

Definition

[edit | edit source]
File:Huber loss.svg
Huber loss (green, δ=1) and squared error loss (blue) as a function of yf(x)

The Huber loss function describes the penalty incurred by an estimation procedure f. Huber (1964) defines the loss function piecewise by[1] Lδ(a)={12a2for |a|δ,δ(|a|12δ),otherwise.

This function is quadratic for small values of a, and linear for large values, with equal values and slopes of the different sections at the two points where |a|=δ. The variable a often refers to the residuals, that is to the difference between the observed and predicted values a=yf(x), so the former can be expanded to[2]

Lδ(y,f(x))={12(yf(x))2for |yf(x)|δ,δ (|yf(x)|12δ),otherwise.

The Huber loss is the convolution of the absolute value function with the rectangular function, scaled and translated. Thus it "smoothens out" the former's corner at the origin.

File:Comparison of loss functions.png
Comparison of Huber loss with other loss functions used for robust regression.

Motivation

[edit | edit source]

Two very commonly used loss functions are the squared loss, L(a)=a2, and the absolute loss, L(a)=|a|. The squared loss function results in an arithmetic mean-unbiased estimator, and the absolute-value loss function results in a median-unbiased estimator (in the one-dimensional case, and a geometric median-unbiased estimator for the multi-dimensional case). The squared loss has the disadvantage that it has the tendency to be dominated by outliers—when summing over a set of a's (as in i=1nL(ai)), the sample mean is influenced too much by a few particularly large a-values when the distribution is heavy tailed: in terms of estimation theory, the asymptotic relative efficiency of the mean is poor for heavy-tailed distributions.

As defined above, the Huber loss function is strongly convex in a uniform neighborhood of its minimum a=0; at the boundary of this uniform neighborhood, the Huber loss function has a differentiable extension to an affine function at points a=δ and a=δ. These properties allow it to combine much of the sensitivity of the mean-unbiased, minimum-variance estimator of the mean (using the quadratic loss function) and the robustness of the median-unbiased estimator (using the absolute value function).

Pseudo-Huber loss function

[edit | edit source]

The Pseudo-Huber loss function can be used as a smooth approximation of the Huber loss function. It combines the best properties of L2 squared loss and L1 absolute loss by being strongly convex when close to the target/minimum and less steep for extreme values. The scale at which the Pseudo-Huber loss function transitions from L2 loss for values close to the minimum to L1 loss for extreme values and the steepness at extreme values can be controlled by the δ value. The Pseudo-Huber loss function ensures that derivatives are continuous for all degrees. It is defined as[3][4]

Lδ(a)=δ2(1+(a/δ)21).

As such, this function approximates a2/2 for small values of a, and approximates a straight line with slope δ for large values of a.

While the above is the most common form, other smooth approximations of the Huber loss function also exist.[5]

Variant for classification

[edit | edit source]

For classification purposes, a variant of the Huber loss called modified Huber is sometimes used. Given a prediction f(x) (a real-valued classifier score) and a true binary class label y{+1,1}, the modified Huber loss is defined as[6]

L(y,f(x))={max(0,1yf(x))2for yf(x)>1,4yf(x)otherwise.

The term max(0,1yf(x)) is the hinge loss used by support vector machines; the quadratically smoothed hinge loss is a generalization of L.[6]

Applications

[edit | edit source]

The Huber loss function is used in robust statistics, M-estimation and additive modelling.[7]

See also

[edit | edit source]

References

[edit | edit source]
  1. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  2. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value). Compared to Hastie et al., the loss is scaled by a factor of 1/2, to be consistent with Huber's original definition given earlier. Though cute and elegant, the Huber loss serves almost no real purpose without scaling by a posteriori variable because the delta cannot be adjusted blindly and be effective; as such, its elegance and simplicity in a time of mathematical open field serves almost no purpose in the machine learning world.
  3. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  4. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  5. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  6. ^ a b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  7. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).