Smooth maximum

In mathematics, a smooth maximum of an indexed family x₁, ..., x_n of numbers is a smooth approximation to the maximum function $\max (x_{1}, \dots, x_{n}),$ meaning a parametric family of functions $m_{α} (x_{1}, \dots, x_{n})$ such that for every ⁠ $α$ ⁠, the function ⁠ $m_{α}$ ⁠ is smooth, and the family converges to the maximum function ⁠ $m_{α} \to \max$ ⁠ as ⁠ $α \to \infty$ ⁠. The concept of smooth minimum is similarly defined. In many cases, a single family approximates both: maximum as the parameter goes to positive infinity, minimum as the parameter goes to negative infinity; in symbols, ⁠ $m_{α} \to \max$ ⁠ as ⁠ $α \to \infty$ ⁠ and ⁠ $m_{α} \to \min$ ⁠ as ⁠ $α \to - \infty$ ⁠. The term can also be used loosely for a specific smooth function that behaves similarly to a maximum, without necessarily being part of a parametrized family.

Examples

Boltzmann operator

Smoothmax of (−x, x) versus x for various parameter values. Very smooth for $α$ =0.5, and more sharp for $α$ =8.

For large positive values of the parameter $α > 0$ , the following formulation is a smooth, differentiable approximation of the maximum function. For negative values of the parameter that are large in absolute value, it approximates the minimum.

𝒮_{α} (x_{1}, \dots, x_{n}) = \frac{\sum_{i = 1}^{n} x_{i} e^{α x_{i}}}{\sum_{i = 1}^{n} e^{α x_{i}}}

$𝒮_{α}$ has the following properties:

$𝒮_{α} \to \max$ as $α \to \infty$
$𝒮_{0}$ is the arithmetic mean of its inputs
$𝒮_{α} \to \min$ as $α \to - \infty$

The gradient of $𝒮_{α}$ is closely related to softmax and is given by

\nabla_{x_{i}} 𝒮_{α} (x_{1}, \dots, x_{n}) = \frac{e^{α x_{i}}}{\sum_{j = 1}^{n} e^{α x_{j}}} [1 + α (x_{i} - 𝒮_{α} (x_{1}, \dots, x_{n}))] .

This makes the softmax function useful for optimization techniques that use gradient descent.

This operator is sometimes called the Boltzmann operator,^[1] after the Boltzmann distribution.

LogSumExp

Another smooth maximum is LogSumExp:

{L S E}_{α} (x_{1}, \dots, x_{n}) = \frac{1}{α} \log \sum_{i = 1}^{n} \exp α x_{i}

This can also be normalized if the $x_{i}$ are all non-negative, yielding a function with domain $[0, \infty)^{n}$ and range $[0, \infty)$ :

g (x_{1}, \dots, x_{n}) = \log (\sum_{i = 1}^{n} \exp x_{i} - (n - 1))

The $(n - 1)$ term corrects for the fact that $\exp (0) = 1$ by canceling out all but one zero exponential, and $\log 1 = 0$ if all $x_{i}$ are zero.

Mellowmax

The mellowmax operator^[1] is defined as follows:

{m m}_{α} (x) = \frac{1}{α} \log \frac{1}{n} \sum_{i = 1}^{n} \exp α x_{i}

It is a non-expansive operator. As $α \to \infty$ , it acts like a maximum. As $α \to 0$ , it acts like an arithmetic mean. As $α \to - \infty$ , it acts like a minimum. This operator can be viewed as a particular instantiation of the quasi-arithmetic mean. It can also be derived from information theoretical principles as a way of regularizing policies with a cost function defined by KL divergence. The operator has previously been utilized in other areas, such as power engineering.^[2]

Connection between LogSumExp and Mellowmax

LogSumExp and Mellowmax are the same function differing by a constant $\frac{\log n}{α}$ . LogSumExp is always larger than the true max, differing at most from the true max by $\frac{\log n}{α}$ in the case where all n arguments are equal and being exactly equal to the true max when all but one argument is $- \infty$ . Similarly, Mellowmax is always less than the true max, differing at most from the true max by $\frac{\log n}{α}$ in the case where all but one argument is $- \infty$ and being exactly equal to the true max when all n arguments are equal.

p-Norm

Another smooth maximum is the p-norm:

‖ (x_{1}, \dots, x_{n}) ‖_{p} = {(\sum_{i = 1}^{n} | x_{i} |^{p})}^{\frac{1}{p}}

which converges to $‖ (x_{1}, \dots, x_{n}) ‖_{\infty} = \max_{1 \leq i \leq n} | x_{i} |$ as $p \to \infty$ .

An advantage of the p-norm is that it is a norm. As such it is scale invariant (homogeneous): $‖ (λ x_{1}, \dots, λ x_{n}) ‖_{p} = | λ | \cdot ‖ (x_{1}, \dots, x_{n}) ‖_{p}$ , and it satisfies the triangle inequality.

Smooth maximum unit

The following binary operator is called the Smooth Maximum Unit (SMU):^[3]

\begin{matrix} \max_{ε} (a, b) & = \frac{a + b + | a - b |_{ε}}{2} \\ = \frac{a + b + \sqrt{(a - b)^{2} + ε}}{2} \end{matrix}

where $ε \geq 0$ is a parameter. As $ε \to 0$ , $| \cdot |_{ε} \to | \cdot |$ and thus $\max_{ε} \to \max$ .

References

^ ^a ^b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

https://www.johndcook.com/soft_maximum.pdf

M. Lange, D. Zühlke, O. Holz, and T. Villmann, "Applications of lp-norms and their smooth approximations for gradient based learning vector quantization," in Proc. ESANN, Apr. 2014, pp. 271-276. (https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2014-153.pdf)

[Asadi-1] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[2] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[3] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[1]

[2]

[3]

Smooth maximum

Contents

Examples

Boltzmann operator

LogSumExp

Mellowmax

Connection between LogSumExp and Mellowmax

p-Norm

Smooth maximum unit

See also

References

Navigation menu

Smooth maximum

Examples

Boltzmann operator

LogSumExp

Mellowmax

Connection between LogSumExp and Mellowmax

p-Norm

Smooth maximum unit

See also

References

Navigation menu

Search