Random coordinate descent

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Randomized (Block) Coordinate Descent Method is an optimization algorithm popularized by Nesterov (2010) and Richtárik and Takáč (2011). The first analysis of this method, when applied to the problem of minimizing a smooth convex function, was performed by Nesterov (2010).[1] In Nesterov's analysis the method needs to be applied to a quadratic perturbation of the original function with an unknown scaling factor. Richtárik and Takáč (2011) provide iteration complexity bounds that do not require this assumption, meaning the method is applied directly to the objective function. Additionally, they generalize the framework to the problem of minimizing a composite function, specifically the sum of a smooth convex function and a (possibly nonsmooth) convex block-separable function.

F(x)=f(x)+Ψ(x),

where Ψ(x)=i=1nΨi(x(i)), xRN is decomposed into n blocks of variables/coordinates: x=(x(1),,x(n)) and Ψ1,,Ψn are (simple) convex functions.

Example (block decomposition): If x=(x1,x2,,x5)R5 and n=3, one may choose x(1)=(x1,x3),x(2)=(x2,x5) and x(3)=x4.

Example (block-separable regularizers):

  1. n=N;Ψ(x)=x1=i=1n|xi|
  2. N=N1+N2++Nn;Ψ(x)=i=1nx(i)2, where x(i)RNi and 2 is the standard Euclidean norm.

Algorithm

[edit | edit source]

Consider the optimization problem

minxRnf(x),

where f is a convex and smooth function.

Smoothness: By smoothness we mean the following: we assume the gradient of f is coordinate-wise Lipschitz continuous with constants L1,L2,,Ln. That is, we assume that

|if(x+hei)if(x)|Li|h|,

for all xRn and hR, where i denotes the partial derivative with respect to variable x(i).

Nesterov, and Richtarik and Takac showed that the following algorithm converges to the optimal point:

Algorithm Random Coordinate Descent Method
    Input: x0Rn //starting point
    Output: x

    set x := x_0

    for k := 1, ... do
        choose coordinate i{1,2,,n}, uniformly at random
        update x(i)=x(i)1Liif(x) 
    end for
  • "←" denotes assignment. For instance, "largestitem" means that the value of largest changes to the value of item.
  • "return" terminates the algorithm and outputs the following value.

Convergence rate

[edit | edit source]

Since the iterates of this algorithm are random vectors, a complexity result would give a bound on the number of iterations needed for the method to output an approximate solution with high probability. It was shown in [2] that if k2nRL(x0)ϵlog(f(x0)f*ϵρ), where RL(x)=maxymaxx*X*{yx*L:f(y)f(x)}, f* is an optimal solution (f*=minxRn{f(x)}), ρ(0,1) is a confidence level and ϵ>0 is target accuracy, then Prob(f(xk)f*>ϵ)ρ.

Example on particular function

[edit | edit source]

The following Figure shows how xk develops during iterations, in principle. The problem is

f(x)=12xT(10.50.51)x(1.51.5)x,x0=(00)T

Extension to block coordinate setting

[edit | edit source]
Blocking coordinate directions into Block coordinate directions

One can naturally extend this algorithm not only just to coordinates, but to blocks of coordinates. Assume that we have space R5. This space has 5 coordinate directions, concretely e1=(1,0,0,0,0)T,e2=(0,1,0,0,0)T,e3=(0,0,1,0,0)T,e4=(0,0,0,1,0)T,e5=(0,0,0,0,1)T in which Random Coordinate Descent Method can move. However, one can group some coordinate directions into blocks and we can have instead of those 5 coordinate directions 3 block coordinate directions (see image).

See also

[edit | edit source]

References

[edit | edit source]
  1. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  2. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).