Theoretical grounds of lambda optimization

Правка en27, от adamant, 2021-12-28 22:09:05

Hi everyone!

This time I'd like to write about what's widely known as "Aliens trick" (as it got popularized after 2016 IOI problem called Aliens). There are already some articles about it here and there, and I'd like to summarize them, while also adding insights into the connection between this trick and generic Lagrange multipliers and Lagrangian duality which often occurs in e.g. linear programming problems.


Tldr.

Let $$$f : X \to \mathbb R$$$ and $$$g : X \to \mathbb R^c$$$. You need to solve the constrained optimization problem

$$$\begin{gather}f(x) \to \min,\\ g(x) = 0.\end{gather}$$$

Let $$$t(\lambda) = \inf_x [f(x) - \lambda \cdot g(x)]$$$. Finding $$$t(\lambda)$$$ is unconstrained problem and is usually much simpler.

It may be rewritten as $$$t(\lambda) = \inf_y [h(y) - \lambda \cdot y]$$$ where $$$h(y)$$$ is the minimum possible $$$f(x)$$$ subject to $$$g(x)=y$$$ instead of $$$g(x)=0$$$.

This is applicable if $$$h(y)$$$ is convex-extensible, as $$$t(\lambda)$$$ defines a supporting hyperplane of $$$h(y)$$$'s graph with normal vector $$$(-\lambda, 1)$$$.

For convex $$$h(y)$$$, there is a monotonicity between $$$\lambda_i$$$ and optimal $$$y_i$$$, so you can find $$$\lambda$$$ corresponding to $$$y=0$$$ with the binary search.

If $$$g(x)$$$ and $$$f(x)$$$ are integer functions, the binary search is doable over integers with $$$\lambda_i$$$ corresponding to $$$h(y_i) - h(y_i-1)$$$.

Boring explanation is below, problem examples are belower.


Lagrange duality

Let $$$f : X \to \mathbb R$$$ be the objective function and $$$g : X \to \mathbb R^c$$$ be the constraint function. The constrained optimization problem

$$$\begin{gather}f(x) \to \min,\\ g(x) = 0\end{gather}$$$

in some cases can be reduced to finding stationary points of the Lagrange function

$$$L(x, \lambda) = f(x) - \lambda \cdot g(x).$$$

Here $$$\lambda \cdot g(x)$$$ is the dot product of $$$g(x)$$$ and a variable vector $$$\lambda \in \mathbb R^c$$$, called the Lagrange multiplier. Mathematical optimization typically focuses on finding stationary points of $$$L(x,\lambda)$$$. However, in our particular case we're more interested in the function

$$$t(\lambda) = \inf\limits_{x \in X} L(x,\lambda),$$$

which is called the Lagrange dual function. If $$$x^*$$$ is the solution to the original problem, then $$$t(\lambda) \leq L(x^*,\lambda)=f(x^*)$$$.

This allows to introduce the Lagrangian dual problem $$$t(\lambda) \to \max$$$. Note that $$$t(\lambda)$$$, as a point-wise infimum of concave (specifically, linear) functions, is always concave, even when $$$X$$$ is, e.g., discrete. If $$$\lambda^*$$$ is the solution to the dual problem, the value $$$f(x^*) - t(\lambda^*)$$$ is called the duality gap. We're specifically interested in the case when it equals zero, which is called the strong duality.

Typical example here is Slater's condition, which says that strong duality holds if $$$f(x)$$$ is convex and there exists $$$x$$$ such that $$$g(x)=0$$$.


Change of domain

In competitive programming, the set $$$X$$$ in definitions above is often weird and very difficult to analyze directly, so Slater's condition is not applicable. As a typical example, $$$X$$$ could be the set of all possible partitions of $$$\{1,2,\dots, n\}$$$ into non-intersecting segments.

To mitigate this, we define $$$h(y)$$$ as the minimum value of $$$f(x)$$$ subject to $$$g(x)=y$$$. In this notion, the dual function is written as

$$$t(\lambda) = \inf\limits_{y \in Y} [h(y) - \lambda \cdot y],$$$

where $$$Y=\{ g(x) : x \in X\}$$$. The set $$$Y$$$ is usually much more regular than $$$X$$$, as just by definition it is already a subset of $$$\mathbb R^c$$$. The strong duality condition is also very clear in this terms: it holds if and only if $$$0 \in Y$$$ and there is a $$$\lambda$$$ for which $$$y=0$$$ delivers infimum.

Geometrically $$$t(\lambda)$$$ defines a level at which the epigraph of $$$h(y)$$$, i. e. the set $$$\{(y,z) : z \geq h(y)\}$$$ has a supporting hyperplane with the normal vector $$$(-\lambda, 1)$$$. Indeed, the half-space bounded by such hyperplane on the level $$$c$$$ is defined as

$$$\{(y, z) : z-\lambda \cdot y \geq c\}.$$$

With $$$c=t(\lambda) > -\infty$$$, all the points at which the hyperplane touches the epigraph would correspond to infimum. Please, refer to the picture below. Lower $$$c$$$ would move the hyperplane lower, while higher $$$c$$$ would move it upper. With $$$c=t(\lambda)$$$, the hyperplane is lowest possible while still intersecting the epigraph of the function in the point $$$(y^*, h(y^*))$$$ where $$$y^*$$$ delivers the minimum of $$$h(y) - \lambda \cdot y$$$.

Competitive programming problems typically assume variable $$$y$$$ in the input, so for strong duality to hold for all inputs, all $$$y \in Y$$$ should have a supporting hyperplane that touches the epigraph in the point $$$(y, h(y))$$$. This condition is equivalent to $$$h(y)$$$ being convex-extensible, that is, there should exist a convex function on $$$\mathbb R^c$$$ such that its restriction to $$$Y$$$ yields $$$h(y)$$$.

We will not distinguish convex and convex-extensible functions further.


Monotonicity of $$$y_\lambda$$$

While calculating $$$t(\lambda)$$$, we typically find $$$x_\lambda$$$ that delivers the infimum of $$$f(x) - \lambda \cdot g(x)$$$. But at the same time we find $$$y_\lambda=g(x_\lambda)$$$ that delivers the infimum of $$$h(y) - \lambda \cdot y$$$. As $$$h(y)$$$ is convex, it is also convex component-wise, which means that increasing $$$y_i$$$ would generally require larger values of $$$\lambda_i$$$ in the supporting plane when all the other components of $$$y$$$ are fixed.

We can do a nested binary search on the components of $$$\lambda$$$ to find the $$$\lambda$$$ that corresponds to $$$y=0$$$. The procedure would look like this:

void adjust(double *lmb, int i, int c) {
    if(i == c) {
        return;
    }
    double l = -inf, r = +inf; // some numbers that are large enough
    while(r - l > eps) {
        double m = (l + r) / 2;
        lmb[i] = m;
        adjust(lmb, i + 1, c);
        auto [xl, yl] = solve(lmb); // returns (x_lambda, y_lambda) the minimum y_lambda[i]
        if(yl[k] < 0) {
            l = m;
        } else {
            r = m;
        }
    }
}

Note that a concrete $$$\lambda_i$$$ might correspond to the contiguous segment of $$$y_i$$$ and vice versa, thus we should find the $$$(x_\lambda,y_\lambda)$$$ pair with the minimum possible $$$i$$$-th component of $$$y_\lambda$$$ to guarantee monotonicity, otherwise binary search might work in an unexpected way.


Integer search

Here and onwards we refer to $$$h(y_i)$$$ as a function of single real variable $$$y_i$$$, assuming that all other components of $$$y$$$ are fixed.

What are the possible $$$y_i$$$ for specific $$$\lambda_i$$$? By definition, it implies $$$h(y_i)-\lambda y_i = h(y'_i) - \lambda y'_i$$$, thus the relationship between them is

$$$\frac{h(y'_i)-h(y_i)}{y'_i-y_i}=\lambda_i.$$$

What are the possible $$$\lambda_i$$$ for specific $$$y_i$$$? When $$$h(y_i)$$$ is continuously differentiable, it essentially means that $$$\lambda_i$$$ corresponds to $$$y_i$$$ such that $$$\lambda_i = h'(y_i)$$$. On the other hand, when $$$Y$$$ is the set of integers, $$$y_i$$$ optimizes $$$t(\lambda)$$$ for all $$$\lambda_i$$$ such that

$$$h(y_i) - h(y_i-1) \leq \lambda_i \leq h(y_i + 1) - h(y_i).$$$

So, if $$$h(y_i)$$$ is an integer function, we may do the integer search of $$$\lambda_i$$$ on possible values of $$$h(k)-h(k-1)$$$ only.

In a very generic case, when the function is not continuously differentiable and $$$y_i$$$ are not necessarily integer, the set of possible $$$\lambda_i$$$ for a given $$$y_i$$$ is defined as $$$\partial h(y_i)$$$, the so-called sub-differential of $$$h$$$ in $$$y_i$$$, formally defined as $$$[a,b]$$$ where

$$$\begin{gather}a = \sup_{y'_i < y_i} \frac{f(y_i) - f(y'_i)}{y_i - y'_i},~~b = \inf_{y_i < y'_i} \frac{f(y'_i) - f(y_i)}{y'_i - y_i}\end{gather}.$$$

The concept of sub-derivatives and sub-differentials can be generalized to multi-dimensional case as well with sub-gradients.


Testing convexity

When $$$Y$$$ is continuous set, convexity may be tested by local criteria. For one-dimensional set, $$$h(y)$$$ is convex if and only if $$$h'(y)$$$ is non-decreasing. If it has a second derivative, we might also say that the function is convex if and only if $$$h' '(y) \geq 0$$$ for all $$$y$$$.

In multidimensional case, the local criterion is that the Hessian matrix $$$\frac{\partial h}{\partial y_i \partial y_j}$$$ is a positive semi-definite.

In discrete case, derivatives can be substituted with finite differences. A function $$$h(y)$$$ defined on $$$\mathbb Z$$$ is convex if and only if

$$$\Delta [h] (y) = h(y+1)-h(y)$$$

is non-decreasing or, which is equivalent,

$$$\Delta^2 [h] (y) = h(y+2)-2h(y+1)+h(y) \geq 0.$$$

For two dimensional case, the function $$$h(x,y)$$$ is convex if it is convex alongside each axis and:

$$$\det \begin{bmatrix}\Delta_{xx} [h](x, y) & \Delta_{xy} [h](x, y) \\ \Delta_{xy} [h](x, y) & \Delta_{yy}[h](x, y) \end{bmatrix} \geq 0.$$$

Problem examples

For simplicity, we will use $$$g(x)=y_0$$$ in primal problem instead of $$$g(x)-y_0=0$$$, while also omitting the constant part in dual function. It leads to the following changes with respect to the written above:

  • Binary search target is set to $$$y=y_0$$$ instead of $$$y=0$$$;
  • The primal solution is determined by $$$f(x^*) = t(\lambda^*) + \lambda \cdot y_0$$$ rather than $$$f(x^*) = t(\lambda^*)$$$.

Sometimes the problem is stated with $$$\max$$$ rather than $$$\min$$$. In this case, $$$t(\lambda)$$$ and $$$h(y)$$$ are defined as supremum and maximum rather than infimum and minimum. Correspondingly, $$$h(y)$$$ needs to be concave rather than convex to allow the usage of the trick.

Gosha is hunting. You're given $$$a$$$, $$$b$$$, $$$p_1, \dots, p_n$$$ and $$$q_1,\dots, q_n$$$. You need to pick two subsets $$$A$$$ and $$$B$$$ of $$$\{1,2,\dots,n\}$$$ of size $$$a$$$ and $$$b$$$ correspondingly in such a way that the following sum is maximized:

$$$f(A, B) = \sum\limits_{i \in A} p_i + \sum\limits_{j \in B} q_j - \sum\limits_{k \in A \cap B} p_k q_k.$$$

Formulating it as a constrained optimization problem, we obtain

$$$\begin{gather}f(A, B) \to \max, \\ |A| = a, \\ |B| = b.\end{gather}$$$

Lagrangian dual here is

$$$t(\lambda_1, \lambda_2) = \max\limits_{A,B} [f(A, B) - \lambda_1|A| - \lambda_2|B|].$$$

Let $$$h(\alpha,\beta) = \max f(A, B)$$$ subject to $$$|A| = \alpha$$$ and $$$|B| = \beta$$$, then

$$$t(\lambda_1, \lambda_2) = \max\limits_{\alpha,\beta}[h(\alpha,\beta)-\lambda_1 \alpha - \lambda_2 \beta].$$$

References

Теги lambda, aliens, tutorial, lagrange, duality

История

 
 
 
 
Правки
 
 
  Rev. Язык Кто Когда Δ Комментарий
en68 Английский adamant 2023-10-11 00:47:07 8
en67 Английский adamant 2022-02-13 16:55:43 33
en66 Английский adamant 2022-01-04 14:57:40 25
en65 Английский adamant 2022-01-04 05:03:24 6363 + honorable mention
en64 Английский adamant 2022-01-04 04:20:18 8 articles
en63 Английский adamant 2022-01-04 04:18:39 354 example 3
en62 Английский adamant 2022-01-04 04:00:52 748 tldr structured
en61 Английский adamant 2022-01-04 03:39:52 33
en60 Английский adamant 2022-01-04 03:38:28 721 example, part 2
en59 Английский adamant 2022-01-04 03:25:48 784
en58 Английский adamant 2022-01-04 03:18:14 1414 example
en57 Английский adamant 2022-01-04 00:21:04 4
en56 Английский adamant 2022-01-04 00:20:45 570 better code for min_conv
en55 Английский adamant 2022-01-03 15:20:41 472 clarified tldr
en54 Английский adamant 2022-01-03 03:37:09 688 code for max-conv of concave functions
en53 Английский adamant 2022-01-03 01:11:10 43 link
en52 Английский adamant 2022-01-03 01:07:50 30
en51 Английский adamant 2022-01-03 01:06:56 12
en50 Английский adamant 2022-01-03 01:02:56 1815 + example
en49 Английский adamant 2022-01-02 20:14:13 160 sections in testing convexity
en48 Английский adamant 2022-01-02 13:29:02 129
en47 Английский adamant 2022-01-02 13:26:24 104 (published)
en46 Английский adamant 2022-01-02 13:21:47 3709
en45 Английский adamant 2022-01-02 12:55:16 690
en44 Английский adamant 2022-01-02 01:54:22 24
en43 Английский adamant 2022-01-02 01:53:53 710
en42 Английский adamant 2022-01-02 01:01:41 210
en41 Английский adamant 2022-01-02 00:54:43 643
en40 Английский adamant 2022-01-02 00:49:11 1627
en39 Английский adamant 2022-01-02 00:09:54 8779
en38 Английский adamant 2022-01-01 22:34:20 1161
en37 Английский adamant 2022-01-01 16:34:48 296
en36 Английский adamant 2022-01-01 16:33:24 4
en35 Английский adamant 2022-01-01 16:29:44 22
en34 Английский adamant 2022-01-01 16:27:27 131
en33 Английский adamant 2022-01-01 16:26:19 173
en32 Английский adamant 2022-01-01 16:23:52 31
en31 Английский adamant 2022-01-01 16:23:22 865
en30 Английский adamant 2021-12-29 05:34:22 1201
en29 Английский adamant 2021-12-29 03:47:08 332
en28 Английский adamant 2021-12-29 02:45:56 2366
en27 Английский adamant 2021-12-28 22:09:05 1383
en26 Английский adamant 2021-12-28 19:28:41 1585
en25 Английский adamant 2021-12-28 18:57:44 52
en24 Английский adamant 2021-12-28 18:44:26 5
en23 Английский adamant 2021-12-28 18:43:58 915
en22 Английский adamant 2021-12-28 18:28:53 112
en21 Английский adamant 2021-12-28 18:25:09 218
en20 Английский adamant 2021-12-28 18:18:48 1224 Tiny change: 'blem\n\n$$f(x)→ming(x)=0f(x)→ming(x)=0\begin{gat' -> 'blem\n\n$$\begin{gat'
en19 Английский adamant 2021-12-28 17:51:33 66
en18 Английский adamant 2021-12-28 17:50:21 316 Tiny change: 'blem\n\n$$f(x)→ming(x)=0f(x)→ming(x)=0\begin{gat' -> 'blem\n\n$$\begin{gat'
en17 Английский adamant 2021-12-28 17:35:06 409 Tiny change: 'blem\n\n$$f(x)→ming(x)=0f(x)→ming(x)=0\begin{gat' -> 'blem\n\n$$\begin{gat'
en16 Английский adamant 2021-12-28 17:18:07 81 Tiny change: 'blem\n\n$$f(x)→ming(x)=0f(x)→ming(x)=0\begin{gat' -> 'blem\n\n$$\begin{gat'
en15 Английский adamant 2021-12-28 17:14:07 1049 Tiny change: 'blem\n\n$$f(x)→ming(x)=0f(x)→ming(x)=0\begin{gat' -> 'blem\n\n$$\begin{gat'
en14 Английский adamant 2021-12-28 16:12:52 96 Tiny change: 'blem\n\n$$f(x)→ming(x)=0f(x)→ming(x)=0\begin{gat' -> 'blem\n\n$$\begin{gat'
en13 Английский adamant 2021-12-28 16:01:48 338 Tiny change: 'blem\n\n$$f(x)→ming(x)=0f(x)→ming(x)=0\begin{gat' -> 'blem\n\n$$\begin{gat'
en12 Английский adamant 2021-12-28 15:54:43 903 Tiny change: 'blem\n\n$$f(x)→ming(x)=0f(x)→ming(x)=0\begin{gat' -> 'blem\n\n$$\begin{gat'
en11 Английский adamant 2021-12-28 04:38:22 2 Tiny change: 'blem\n\n$$f(x)→ming(x)=0f(x)→ming(x)=0\begin{gat' -> 'blem\n\n$$\begin{gat'
en10 Английский adamant 2021-12-28 04:35:18 118 Tiny change: 'blem\n\n$$f(x)→ming(x)=0f(x)→ming(x)=0\begin{gat' -> 'blem\n\n$$\begin{gat'
en9 Английский adamant 2021-12-28 04:23:41 1406 Tiny change: 'blem\n\n$$f(x)→ming(x)=0f(x)→ming(x)=0\begin{gat' -> 'blem\n\n$$\begin{gat'
en8 Английский adamant 2021-12-28 03:18:55 333 Tiny change: 'blem\n\n$$f(x)→ming(x)=0f(x)→ming(x)=0\begin{gat' -> 'blem\n\n$$\begin{gat'
en7 Английский adamant 2021-12-28 02:59:30 322 Tiny change: 'blem\n\n$$f(x)→ming(x)=0f(x)→ming(x)=0\begin{gat' -> 'blem\n\n$$\begin{gat'
en6 Английский adamant 2021-12-28 01:40:50 2845 Tiny change: 'blem\n\n$$f(x)→ming(x)=0f(x)→ming(x)=0\begin{gat' -> 'blem\n\n$$\begin{gat'
en5 Английский adamant 2021-12-27 22:08:35 26 Tiny change: 'blem\n\n$$f(x)→ming(x)=0f(x)→ming(x)=0\begin{gat' -> 'blem\n\n$$\begin{gat'
en4 Английский adamant 2021-12-26 19:06:11 0 Tiny change: 'blem\n\n$$f(x)→ming(x)=0f(x)→ming(x)=0\begin{gat' -> 'blem\n\n$$\begin{gat'
en3 Английский adamant 2021-12-26 01:42:07 1888 Tiny change: 'blem\n\n$$f(x)→extrg(x)=0f(x)→extrg(x)=0\begin{gat' -> 'blem\n\n$$\begin{gat'
en2 Английский adamant 2021-12-25 16:27:58 176 Tiny change: 'blem\n\n$$f(x)→extrg(x)=0f(x)→extrg(x)=0\begin{gat' -> 'blem\n\n$$\begin{gat'
en1 Английский adamant 2021-12-25 16:21:30 2909 Initial revision (saved to drafts)