Transportation theory (mathematics)

From Wikipedia, the free encyclopedia
(Redirected from Transportation problem)
Jump to navigation Jump to search

In mathematics and economics, transportation theory or transport theory is a name given to the study of optimal transportation and allocation of resources. The problem was formalized by the French mathematician Gaspard Monge in 1781.[1]

In the 1920s A. N. Tolstoi was one of the first to study the transportation problem mathematically. In 1930, in the collection Transportation Planning Volume I for the National Commissariat of Transportation of the Soviet Union, he published a paper "Methods of Finding the Minimal Kilometrage in Cargo-transportation in space".[2][3]

Major advances were made in the field during World War II by the Soviet mathematician and economist Leonid Kantorovich.[4] Consequently, the problem as it is stated is sometimes known as the Monge–Kantorovich transportation problem.[5] The linear programming formulation of the transportation problem is also known as the HitchcockKoopmans transportation problem.[6]

Motivation

[edit | edit source]

Mines and factories

[edit | edit source]
Two one-dimensional distributions μ and ν, plotted on the x and y. The two distributions can be pictured as two piles of dirt, one before moving, and one after moving. The heatmap in the center is a transport plan and denotes where each atom of dirt would be moved to.

Suppose that we have a collection of m mines mining iron ore, and a collection of n factories which use the iron ore that the mines produce. Suppose for the sake of argument that these mines and factories form two disjoint subsets M and F of the Euclidean plane 2. Suppose also that we have a cost function c:2×2[0,), so that c(x,y) is the cost of transporting one shipment of iron from x to y. For simplicity, we ignore the time taken to do the transporting. We also assume that each mine can supply only one factory (no splitting of shipments) and that each factory requires precisely one shipment to be in operation (factories cannot work at half- or double-capacity). Having made the above assumptions, a transport plan is a bijection T:MF. In other words, each mine mM supplies precisely one target factory T(m)F and each factory is supplied by precisely one mine. We wish to find the optimal transport plan, the plan T whose total cost

c(T):=mMc(m,T(m))

is the least of all possible transport plans from M to F. This motivating special case of the transportation problem is an instance of the assignment problem. More specifically, it is equivalent to finding a minimum weight matching in a bipartite graph.

This can be generalized to the continuous case, where there are infinitely many mines and factories distributed on the real line, or generally in any metric space. This case is usually pictured as "changing the shape of a pile of dirt", and thus called the earth mover's problem.

Moving books: the importance of the cost function

[edit | edit source]

The following simple example illustrates the importance of the cost function in determining the optimal transport plan. Suppose that we have n books of equal width on a shelf (the real line), arranged in a single contiguous block. We wish to rearrange them into another contiguous block, but shifted one book-width to the right. Two obvious candidates for the optimal transport plan present themselves:

  1. move all n books one book-width to the right ("many small moves");
  2. move the left-most book n book-widths to the right and leave all other books fixed ("one big move").

If the cost function is proportional to Euclidean distance (c(x,y)=αxy for some α>0) then these two candidates are both optimal. If, on the other hand, we choose the strictly convex cost function proportional to the square of Euclidean distance (c(x,y)=αxy2 for some α>0), then the "many small moves" option becomes the unique minimizer.

Note that the above cost functions consider only the horizontal distance traveled by the books, not the horizontal distance traveled by a device used to pick each book up and move the book into position. If the latter is considered instead, then, of the two transport plans, the second is always optimal for the Euclidean distance, while, provided there are at least 3 books, the first transport plan is optimal for the squared Euclidean distance.

Hitchcock problem

[edit | edit source]

The following transportation problem formulation is credited to F. L. Hitchcock:[7]

Suppose there are m sources x1,,xm for a commodity, with a(xi) units of supply at xi and n sinks y1,,yn for the commodity, with the demand b(yj) at yj. If c(xi, yj) is the unit cost of shipment from xi to yj, find a flow that satisfies demand from supplies and minimizes the flow cost. This challenge in logistics was taken up by D. R. Fulkerson[8] and in the book Flows in Networks (1962) written with L. R. Ford Jr.[9]

Tjalling Koopmans is also credited with formulations of transport economics and allocation of resources.

Abstract formulation of the problem

[edit | edit source]

Monge and Kantorovich formulations

[edit | edit source]

The transportation problem as it is stated in modern or more technical literature looks somewhat different because of the development of Riemannian geometry and measure theory. The mines-factories example, simple as it is, is a useful reference point when thinking of the abstract case. In this setting, we allow the possibility that we may not wish to keep all mines and factories open for business, and allow mines to supply more than one factory, and factories to accept iron from more than one mine.

Let X and Y be two separable metric spaces such that any probability measure on X (or Y) is a Radon measure (i.e. they are Radon spaces). Let c:X×Y[0,) be a Borel-measurable function. Given probability measures μ on X and ν on Y, Monge's formulation of the optimal transportation problem is to find a transport map T:XY that realizes the infimum

inf{Xc(x,T(x))dμ(x)|T*(μ)=ν},

where T*(μ) denotes the push forward of μ by T. A map T that attains this infimum (i.e. makes it a minimum instead of an infimum) is called an "optimal transport map".

Monge's formulation of the optimal transportation problem can be ill-posed, because sometimes there is no T satisfying T*(μ)=ν: this happens, for example, when μ is a Dirac measure but ν is not.

We can improve on this by adopting Kantorovich's formulation of the optimal transportation problem, which is to find a probability measure γ on X×Y that attains the infimum

inf{X×Yc(x,y)dγ(x,y)|γΓ(μ,ν)},

where Γ(μ,ν) denotes the collection of all probability measures on X×Y with marginals μ on X and ν on Y.

Cost duality

[edit | edit source]
Example c-duality transform, where c(x, y) = 2(cos(3x) + 1)|y − x|² + (4 − 2(cos(3x) + 1))|y − x|⁴ and ψ(x)=ex2.

Given a cost function

c(x,y)

, it produces a duality transformation

ψψc

defined by

ψc(y):=infx(c(x,y)ψ(x))

This generalizes Legendre transformation, which is the case where

c(x,y)=xy

with a sign flip.

The c-convexification of a curve in the case where c(x,y)=|xy|.

1.

We say that a function ψ is c-convex if ψ=φc for some φ. Note that because φccc=φc, we can always assume that φ is c-convex. The c-convexification of a function ψ is ψcc. Equivalently, it is the smallest c-convex function ψ such that ψψ pointwise.[10]: Prop. 5.8  Like in the case of convex transformation, ψ is c-convex iff ψ=ψcc.

If ψ=φc:X is c-convex, then the set of c-subdifferential of ψ at xX is the set of yY such that ψ(x)=c(x,y)φ(y). Similarly for Y.

When X=Y, the graph (y,ψc(y)) can be constructed as follows: Take the graph of ψ, and flip it upside down. At each point (x,ψ(x)), construct a graph of yc(x,y) apexed at (x,ψ(x)). That is, it is the graph of yc(x,y)ψ(x). We obtain a whole set of such graphs. Their lower-edge envelope is the graph of ψc.

In the same image, we can see what it means for a function φ(y) to be c-convex. It is c-convex iff its entire graph can be "touched" by a "tipped tool" that is moving and shape-shifting. When the tipped tool is at x, it has a shape of yc(x,y) and is raised to a height of ψ(x). The graph of the c-convexification φcc(y) is constructed by running the tipped tool so that it is lowered as much as possible, while still touching graph of φ(y) on the upper side. The lower envelope swept out by the tipped tool is the graph of φcc(y).[10]: Fig. 5.2 

For example, if X=Y=n is a metric space and c(x,y)=xy, then φ:X is c-convex iff it is 1-Lipschitz. This is used in the definition of 1-Wasserstein distance. If c(x,y)=xy2, then φ is c-convex iff its graph could be touched from above by a tipped tool with the shape of a paraboloid.

Existence and uniqueness

[edit | edit source]

Under fairly permissive assumptions, optimal transport plan exists.

If

  • (X,μX),(Y,μY) are Polish probability spaces,
  • c:X×Y{} is lower semicontinuous,
  • and there exists some upper semicontinuous functions aL1(μX),bL1(μY) of type a:X{},b:Y{} such that c(x,y)a(x)+b(y),

then an optimal transport plan exists. That is, exists

γ*Γ(μX,μY)

such that it reaches the infimum.[10]: Thm. 4.1 

Note that the infimum could be infinite if all transport plans turn out to be infinite. For example, if

X=Y=,c(x,y)=|xy|,μX

is the Cauchy distribution, and

μY=δ0

.

If

  • (X,μX),(Y,μY) are Polish probability spaces,
  • c:X×Y is lower semicontinuous,
  • there exists some upper semicontinuous functions aL1(μX),bL1(μY) of type a:X,b:Y such that c(x,y)a(x)+b(y),
  • there exists a finite-cost transport plan,
  • and for any c-convex function ψ:X{}, for μX-almost all xX, ψ has a unique c-subdifferential at x

then an optimal transport map exists.[10]: Thm. 5.30 

A restriction of an optimal transport plan is still optimal. That is, suppose

γΓ(μ,ν)

is optimal, and

0<γ<γ

, and define the normalized transport plan

γ¯:=γ/γ(X×Y)

, then

γ¯

is an optimal transport plan between its own marginals.[10]: Thm. 4.6  If

γ¯

isn't optimal, then there exists an improvement of it, which then translates back to an improvement of the original

γ

.

Kantorovich duality

[edit | edit source]

The Kantorovich duality states that:[10]: Thm. 5.10 

If

(X,μX),(Y,μY)

are Polish probability spaces,

c:X×Y{}

is lower semicontinuous, and there exists some upper semicontinuous functions

aL1(μX),bL1(μY)

of type

a:X,b:Y

such that

c(x,y)a(x)+b(y)

, then

infγΓ(μ,ν)(X×Yc(x,y)dγ(x,y))=supφ is c-convex(Xφ(x)dμ(x)+Yφc(y)dν(y))

If furthermore,

c

only takes real values, there exists a transport plan with finite cost, and there exists some functions

aL1(μX),bL1(μY)

such that

c(x,y)a(x)+b(y)

, then

minγΓ(μ,ν)(X×Yc(x,y)dγ(x,y))=maxφ is c-convex(Xφ(x)dμ(x)+Yφc(y)dν(y))

Consider the second case, where we can actually arrive at an exactly optimal plan, instead of merely getting closer and closer. In this case, an optimal transport plan

γΓ(μ,ν)

, constrains the form of an optimal pricing pair

(φ,ψ)

, and vice versa.

Given such an optimal pricing pair (φ,ψ),[10]: Remark 5.13 

  • given an arbitrary transport plan γΓ(μ,ν), if all (x,y)supp(γ) satisfies the exact equality c(x,y)=φ(x)+ψ(y), then γ is an optimal plan;
  • given an optimal transport plan γΓ(μ,ν), any (x,y)supp(γ) must satisfy the exact equality c(x,y)=φ(x)+ψ(y).

More succinctly, a transport plan is optimal iff it is supported on the set of c-subdifferential pairs of (φ,ψ).

Stability

[edit | edit source]

The optimal transportation is stable in the following sense:[10]: Thm. 5.20 

Assume that

(X,μ),(Y,ν)

are Polish probability spaces,

c:X×Y

is continuous, and

infc

is finite. Given a sequence of continuous functions

c:X×Y

converging uniformly to

c

over

X×Y

, a sequence

μkμ

weakly, a sequence

νkν

weakly, and a sequence of optimal transport plans

γkΓ(μk,νk)

. If the transport costs

ckdπk

satisfy

ckdπk<+,k

and

lim infkckdπk<+

, then

γk

converges weakly to some

γ

, and

γ

is an optimal transport plan from

μ

to

ν

.

Similarly, the optimal transport map is also stable.[10]: Cor. 5.23 

Assume that

(X,μ),(Y,ν)

are Polish probability spaces,

X

is locally compact,

c:X×Y

is lower semicontinuous, and

infc

is finite. Given a sequence of lower semicontinuous functions

c:X×Y

converging uniformly to

c

over

X×Y

, a sequence

νkν

weakly,

Economic interpretation

[edit | edit source]

The optimal transport problem has an economic interpretation.[11] Cédric Villani recounts the following interpretation from Luis Caffarelli:[12]

Suppose you want to ship some coal from mines, distributed as

μ

, to factories, distributed as

ν

. The cost function of transport is

c

. Now a shipper comes and offers to do the transport for you. You would pay him

f(x)

per coal for loading the coal at

x

, and pay him

g(y)

per coal for unloading the coal at

y

. For you to accept the deal, the price schedule must satisfy

f(x)+g(y)c(x,y)

. The Kantorovich duality states that the shipper can make a price schedule that makes you pay almost as much as you would ship yourself.

In the interpretation, the duality transformation transforms a loading cost function

φ(x)

into the optimal (for the shipper) unloading cost function

ψ(y)=φc(y)

. If the unloading cost function

ψ(y)

were any higher at any point, then there would be some route

xy

on which

c(x,y)<φ(x)+ψ(y)

, meaning that there is some route on which you would rather ship yourself. But if the unloading cost function were any lower at any point, then the shipper could have earned more money by raising the price there. Therefore, the shipper should always choose

ψ=φc

. The same argument applied again then states that the shipper should always choose

φ=ψc

, and therefore we obtain the lower bound half of the duality formula:

infγΓ(μ,ν)(X×Yc(x,y)dγ(x,y))supφ is c-convex(Xφ(x)dμ(x)+Yφc(y)dν(y)),

The Kantorovich duality states that it is in fact an equality, i.e. the shipper can make you pay as much as you would pay yourself, though the shipper might never exactly reach the bound (thus the use of infimum and supremum, instead of minimum and maximum).

Assume that the shipper in fact must pay the same cost function and us, and can exactly reach the maximum revenue using (φ,ψ) as their pricing chart. Then the shipper must use an optimal plan, at which point the shipper just breaks even with no profit. Conversely, any shipping plan that allows the shipper to exactly break even must be optimal.

Solution of the problem

[edit | edit source]

Optimal transportation on the real line

[edit | edit source]
Optimal transportation matrix
Continuous optimal transport

For 1p<, let 𝒫p() denote the collection of probability measures on that have finite p-th moment. Let μ,ν𝒫p() and let c(x,y)=h(xy), where h:[0,) is a convex function.

  1. If μ has no atom, i.e., if the cumulative distribution function Fμ:[0,1] of μ is a continuous function, then Fν1Fμ: is an optimal transport map. It is the unique optimal transport map if h is strictly convex.
  2. We have
minγΓ(μ,ν)2c(x,y)dγ(x,y)=01c(Fμ1(s),Fν1(s))ds.

The proof of this solution appears in Rachev & Rüschendorf (1998).[13]

Discrete version and linear programming formulation

[edit | edit source]

In the case where the margins μ and ν are discrete, let μx and νy be the probability masses respectively assigned to x𝐗 and y𝐘, and let γxy be the probability of an xy assignment. The objective function in the primal Kantorovich problem is then

x𝐗,y𝐘γxycxy

and the constraint γΓ(μ,ν) expresses as

y𝐘γxy=μx,x𝐗

and

x𝐗γxy=νy,y𝐘.

In order to input this in a linear programming problem, we need to vectorize the matrix γxy by either stacking its columns or its rows, we call vec this operation. In the column-major order, the constraints above rewrite as

(11×|𝐘|I|𝐗|)vec(γ)=μ and (I|𝐘11×|𝐗|)vec(γ)=ν

where is the Kronecker product, 1n×m is a matrix of size n×m with all entries of ones, and In is the identity matrix of size n. As a result, setting z=vec(γ), the linear programming formulation of the problem is

Minimize vec(c)zsubject to:z0,(11×|𝐘|I|𝐗|I|𝐘|11×|𝐗|)z=(μν)

which can be readily inputted in a large-scale linear programming solver (see chapter 3.4 of Galichon (2016)[11]).

Semi-discrete case

[edit | edit source]

In the semi-discrete case, X=Y=d and μ is a continuous distribution over d, while ν=j=1Jνjδyi is a discrete distribution which assigns probability mass νj to site yjd. In this case, we can see[14] that the primal and dual Kantorovich problems respectively boil down to:

inf{Xj=1Jc(x,yj)dγj(x),γΓ(μ,ν)}

for the primal, where γΓ(μ,ν) means that Xdγj(x)=νj and jdγj(x)=dμ(x), and:

sup{Xφ(x)dμ(x)+j=1Jψjνj:ψj+φ(x)c(x,yj)}

for the dual, which can be rewritten as:

supψJ{Xinfj{c(x,yj)ψj}dμ(x)+j=1Jψjνj}

which is a finite-dimensional convex optimization problem that can be solved by standard techniques, such as gradient descent.

In the case when c(x,y)=|xy|2/2, one can show that the set of x𝐗 assigned to a particular site j is a convex polyhedron. The resulting configuration is called a power diagram.[15]

Quadratic normal case

[edit | edit source]

Assume the particular case μ=𝒩(0,ΣX), ν=𝒩(0,ΣY), and c(x,y)=|yAx|2/2 where A is invertible. One then has

φ(x)=xΣX1/2(ΣX1/2AΣYAΣX1/2)1/2ΣX1/2x/2
ψ(y)=yAΣX1/2(ΣX1/2AΣYAΣX1/2)1/2ΣX1/2Ay/2
T(x)=(A)1ΣX1/2(ΣX1/2AΣYAΣX1/2)1/2ΣX1/2x

The proof of this solution appears in Galichon (2016).[11]

Separable Hilbert spaces

[edit | edit source]

Let X be a separable Hilbert space. Let 𝒫p(X) denote the collection of probability measures on X that have finite p-th moment; let 𝒫pr(X) denote those elements μ𝒫p(X) that are Gaussian regular: if g is any strictly positive Gaussian measure on X and g(N)=0, then μ(N)=0 also.

Let μ𝒫pr(X), ν𝒫p(X), c(x,y)=|xy|p/p for p(1,),p1+q1=1. Then the Kantorovich problem has a unique solution κ, and this solution is induced by an optimal transport map: i.e., there exists a Borel map rLp(X,μ;X) such that

κ=(idX×r)*(μ)Γ(μ,ν).

Moreover, if ν has bounded support, then

r(x)=x|φ(x)|q2φ(x)

for μ-almost all xX for some locally Lipschitz, c-concave and maximal Kantorovich potential φ. (Here φ denotes the Gateaux derivative of φ.)

By minimizing flows

[edit | edit source]

A gradient descent formulation for the solution of the Monge–Kantorovich problem was given by Sigurd Angenent, Steven Haker, and Allen Tannenbaum.[16]

Entropic regularization

[edit | edit source]

Consider a variant of the discrete problem above, where we have added an entropic regularization term to the objective function of the primal problem

Minimize x𝐗,y𝐘γxycxy+εγxylnγxysubject to: γ0y𝐘γxy=μx,x𝐗x𝐗γxy=νy,y𝐘

One can show that the dual regularized problem is

maxφ,ψx𝐗φxμx+y𝐘ψyvyεx𝐗,y𝐘exp(φx+ψycxyε)

where, compared with the unregularized version, the "hard" constraint in the former dual (φx+ψycxy0) has been replaced by a "soft" penalization of that constraint (the sum of the εexp((φx+ψycxy)/ε) terms). The optimality conditions in the dual problem can be expressed as

Eq. 5.1: μx=y𝐘exp(φx+ψycxyε)x𝐗
Eq. 5.2: νy=x𝐗exp(φx+ψycxyε)y𝐘

Denoting A as the |𝐗|×|𝐘| matrix of term Axy=exp(cxy/ε), solving the dual is therefore equivalent to looking for two diagonal positive matrices D1 and D2 of respective sizes |𝐗| and |𝐘|, such that D1AD21|𝐘|=μ and (D1AD2)1|𝐗|=ν. The existence of such matrices generalizes Sinkhorn's theorem and the matrices can be computed using the Sinkhorn–Knopp algorithm,[17] which simply consists of iteratively looking for φx to solve Equation 5.1, and ψy to solve Equation 5.2. Sinkhorn–Knopp's algorithm is therefore a coordinate descent algorithm on the dual regularized problem.

Applications

[edit | edit source]

The Monge–Kantorovich optimal transport has found applications in wide range in different fields. Among them are:

See also

[edit | edit source]

References

[edit | edit source]
  1. ^ G. Monge. Mémoire sur la théorie des déblais et des remblais. Histoire de l'Académie Royale des Sciences de Paris, avec les Mémoires de Mathématique et de Physique pour la même année, pages 666–704, 1781.
  2. ^ Schrijver, Alexander, Combinatorial Optimization, Berlin; New York : Springer, 2003. Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).. Cf. p. 362
  3. ^ Ivor Grattan-Guinness, Ivor, Companion encyclopedia of the history and philosophy of the mathematical sciences, Volume 1, JHU Press, 2003. Cf. p.831
  4. ^ L. Kantorovich. On the translocation of masses. C.R. (Doklady) Acad. Sci. URSS (N.S.), 37:199–201, 1942.
  5. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  6. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  7. ^ Frank L. Hitchcock (1941) "The distribution of a product from several sources to numerous localities", MIT Journal of Mathematics and Physics 20:224–230 Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value)..
  8. ^ D. R. Fulkerson (1956) Hitchcock Transportation Problem, RAND corporation.
  9. ^ L. R. Ford Jr. & D. R. Fulkerson (1962) § 3.1 in Flows in Networks, page 95, Princeton University Press
  10. ^ a b c d e f g h i Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  11. ^ a b c Galichon, Alfred. Optimal Transport Methods in Economics. Princeton University Press, 2016.
  12. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  13. ^ Rachev, Svetlozar T., and Ludger Rüschendorf. Mass Transportation Problems: Volume I: Theory. Vol. 1. Springer, 1998.
  14. ^ Santambrogio, Filippo. Optimal Transport for Applied Mathematicians. Birkhäuser Basel, 2016. In particular chapter 6, section 4.2.
  15. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value)..
  16. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  17. ^ Peyré, Gabriel and Marco Cuturi (2019), "Computational Optimal Transport: With Applications to Data Science", Foundations and Trends in Machine Learning: Vol. 11: No. 5-6, pp 355–607. DOI: 10.1561/2200000073.
  18. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  19. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  20. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  21. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

Further reading

[edit | edit source]
  • Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

Lua error in Module:Authority_control at line 153: attempt to index field 'wikibase' (a nil value).