Huber loss

In statistics, the Huber loss is a loss function used in robust regression, that is less sensitive to outliers in data than the squared error loss. A variant for classification is also sometimes used.

Definition

File:Huber loss.svg

Huber loss (green,

δ = 1

) and squared error loss (blue) as a function of

y - f (x)

The Huber loss function describes the penalty incurred by an estimation procedure $f$ . Huber (1964) defines the loss function piecewise by^[1] $L_{δ} (a) = {\begin{matrix} \frac{1}{2} a^{2} & for | a | \leq δ, \\ δ \cdot (| a | - \frac{1}{2} δ), & otherwise. \end{matrix}$

This function is quadratic for small values of $a$ , and linear for large values, with equal values and slopes of the different sections at the two points where $| a | = δ$ . The variable $a$ often refers to the residuals, that is to the difference between the observed and predicted values $a = y - f (x)$ , so the former can be expanded to^[2]

$L_{δ} (y, f (x)) = {\begin{matrix} \frac{1}{2} {(y - f (x))}^{2} & for | y - f (x) | \leq δ, \\ δ \cdot (| y - f (x) | - \frac{1}{2} δ), & otherwise. \end{matrix}$

The Huber loss is the convolution of the absolute value function with the rectangular function, scaled and translated. Thus it "smoothens out" the former's corner at the origin.

File:Comparison of loss functions.png

Comparison of Huber loss with other loss functions used for robust regression.

Motivation

Two very commonly used loss functions are the squared loss, $L (a) = a^{2}$ , and the absolute loss, $L (a) = | a |$ . The squared loss function results in an arithmetic mean-unbiased estimator, and the absolute-value loss function results in a median-unbiased estimator (in the one-dimensional case, and a geometric median-unbiased estimator for the multi-dimensional case). The squared loss has the disadvantage that it has the tendency to be dominated by outliers—when summing over a set of $a$ 's (as in $\sum_{i = 1}^{n} L (a_{i})$ ), the sample mean is influenced too much by a few particularly large $a$ -values when the distribution is heavy tailed: in terms of estimation theory, the asymptotic relative efficiency of the mean is poor for heavy-tailed distributions.

As defined above, the Huber loss function is strongly convex in a uniform neighborhood of its minimum $a = 0$ ; at the boundary of this uniform neighborhood, the Huber loss function has a differentiable extension to an affine function at points $a = - δ$ and $a = δ$ . These properties allow it to combine much of the sensitivity of the mean-unbiased, minimum-variance estimator of the mean (using the quadratic loss function) and the robustness of the median-unbiased estimator (using the absolute value function).

Pseudo-Huber loss function

The Pseudo-Huber loss function can be used as a smooth approximation of the Huber loss function. It combines the best properties of L2 squared loss and L1 absolute loss by being strongly convex when close to the target/minimum and less steep for extreme values. The scale at which the Pseudo-Huber loss function transitions from L2 loss for values close to the minimum to L1 loss for extreme values and the steepness at extreme values can be controlled by the $δ$ value. The Pseudo-Huber loss function ensures that derivatives are continuous for all degrees. It is defined as^[3]^[4]

$L_{δ} (a) = δ^{2} (\sqrt{1 + (a / δ)^{2}} - 1) .$

As such, this function approximates $a^{2} / 2$ for small values of $a$ , and approximates a straight line with slope $δ$ for large values of $a$ .

While the above is the most common form, other smooth approximations of the Huber loss function also exist.^[5]

Variant for classification

For classification purposes, a variant of the Huber loss called modified Huber is sometimes used. Given a prediction $f (x)$ (a real-valued classifier score) and a true binary class label $y \in {+ 1, - 1}$ , the modified Huber loss is defined as^[6]

$L (y, f (x)) = {\begin{matrix} \max (0, 1 - y f (x))^{2} & for y f (x) > - 1, \\ - 4 y f (x) & otherwise. \end{matrix}$

The term $\max (0, 1 - y f (x))$ is the hinge loss used by support vector machines; the quadratically smoothed hinge loss is a generalization of $L$ .^[6]

Applications

The Huber loss function is used in robust statistics, M-estimation and additive modelling.^[7]

References

^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value). Compared to Hastie et al., the loss is scaled by a factor of ⁠1/2⁠, to be consistent with Huber's original definition given earlier. Though cute and elegant, the Huber loss serves almost no real purpose without scaling by a posteriori variable because the delta cannot be adjusted blindly and be effective; as such, its elegance and simplicity in a time of mathematical open field serves almost no purpose in the machine learning world.
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ ^a ^b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[1] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[2] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value). Compared to Hastie et al., the loss is scaled by a factor of ⁠1/2⁠, to be consistent with Huber's original definition given earlier. Though cute and elegant, the Huber loss serves almost no real purpose without scaling by a posteriori variable because the delta cannot be adjusted blindly and be effective; as such, its elegance and simplicity in a time of mathematical open field serves almost no purpose in the machine learning world.

[3] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[4] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[5] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[zhang-6] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[7] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[1]

[2]

[3]

[4]

[5]

[6]

[7]

Huber loss

Contents

Definition

Motivation

Pseudo-Huber loss function

Variant for classification

Applications

See also

References

Navigation menu

Huber loss

Definition

Motivation

Pseudo-Huber loss function

Variant for classification

Applications

See also

References

Navigation menu

Search