Smooth maximum

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

In mathematics, a smooth maximum of an indexed family x1, ..., xn of numbers is a smooth approximation to the maximum function max(x1,,xn), meaning a parametric family of functions mα(x1,,xn) such that for every α, the function mα is smooth, and the family converges to the maximum function mαmax as α. The concept of smooth minimum is similarly defined. In many cases, a single family approximates both: maximum as the parameter goes to positive infinity, minimum as the parameter goes to negative infinity; in symbols, mαmax as α and mαmin as α. The term can also be used loosely for a specific smooth function that behaves similarly to a maximum, without necessarily being part of a parametrized family.

Examples

[edit | edit source]

Boltzmann operator

[edit | edit source]
File:Smoothmax.png
Smoothmax of (−x, x) versus x for various parameter values. Very smooth for α=0.5, and more sharp for α=8.

For large positive values of the parameter α>0, the following formulation is a smooth, differentiable approximation of the maximum function. For negative values of the parameter that are large in absolute value, it approximates the minimum.

𝒮α(x1,,xn)=i=1nxieαxii=1neαxi

𝒮α has the following properties:

  1. 𝒮αmax as α
  2. 𝒮0 is the arithmetic mean of its inputs
  3. 𝒮αmin as α

The gradient of 𝒮α is closely related to softmax and is given by

xi𝒮α(x1,,xn)=eαxij=1neαxj[1+α(xi𝒮α(x1,,xn))].

This makes the softmax function useful for optimization techniques that use gradient descent.

This operator is sometimes called the Boltzmann operator,[1] after the Boltzmann distribution.

LogSumExp

[edit | edit source]

Another smooth maximum is LogSumExp:

LSEα(x1,,xn)=1αlogi=1nexpαxi

This can also be normalized if the xi are all non-negative, yielding a function with domain [0,)n and range [0,):

g(x1,,xn)=log(i=1nexpxi(n1))

The (n1) term corrects for the fact that exp(0)=1 by canceling out all but one zero exponential, and log1=0 if all xi are zero.

Mellowmax

[edit | edit source]

The mellowmax operator[1] is defined as follows:

mmα(x)=1αlog1ni=1nexpαxi

It is a non-expansive operator. As α, it acts like a maximum. As α0, it acts like an arithmetic mean. As α, it acts like a minimum. This operator can be viewed as a particular instantiation of the quasi-arithmetic mean. It can also be derived from information theoretical principles as a way of regularizing policies with a cost function defined by KL divergence. The operator has previously been utilized in other areas, such as power engineering.[2]

Connection between LogSumExp and Mellowmax

[edit | edit source]

LogSumExp and Mellowmax are the same function differing by a constant lognα. LogSumExp is always larger than the true max, differing at most from the true max by lognα in the case where all n arguments are equal and being exactly equal to the true max when all but one argument is . Similarly, Mellowmax is always less than the true max, differing at most from the true max by lognα in the case where all but one argument is and being exactly equal to the true max when all n arguments are equal.

p-Norm

[edit | edit source]

Another smooth maximum is the p-norm:

(x1,,xn)p=(i=1n|xi|p)1p

which converges to (x1,,xn)=max1in|xi| as p.

An advantage of the p-norm is that it is a norm. As such it is scale invariant (homogeneous): (λx1,,λxn)p=|λ|(x1,,xn)p, and it satisfies the triangle inequality.

Smooth maximum unit

[edit | edit source]

The following binary operator is called the Smooth Maximum Unit (SMU):[3]

maxε(a,b)=a+b+|ab|ε2=a+b+(ab)2+ε2

where ε0 is a parameter. As ε0, ||ε|| and thus maxεmax.

See also

[edit | edit source]

References

[edit | edit source]
  1. ^ a b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  2. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  3. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

https://www.johndcook.com/soft_maximum.pdf

M. Lange, D. Zühlke, O. Holz, and T. Villmann, "Applications of lp-norms and their smooth approximations for gradient based learning vector quantization," in Proc. ESANN, Apr. 2014, pp. 271-276. (https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2014-153.pdf)