seiko.iwasawa's blog

By seiko.iwasawa, 3 years ago, translation, In English

Problem statement

Let $$$w(j, i)$$$ be some cost function that we are able to calculate in $$$O(1)$$$ and which satisfies quadrangle inequality:

$$$w(a, c) + w(b, d) \le w(a, d) + w(b, c)$$$ for $$$a \le b \le c \le d$$$.

The problem is to calculate the following dynamic programming: $$$dp[i] = \min\limits_{0 \le j < i} dp[j] + w(j, i)$$$ (let's initialize $$$dp[0] = 0$$$) faster than in $$$O(n^2)$$$, where $$$n$$$ is the size of the dynamic programming.

Examples of cost functions

Theory

First of all, let's introduce an auxiliary 2-dimensional DP for the purpose of simplifying the theory (it wouldn't appear in the algorithm):

$$$tmp[i][j] = \min\limits_{0 \le x \le j} dp[x] + w(x, i)$$$; ($$$j < i$$$).

Essentially, $$$tmp[i][j]$$$ is an intermediate state of $$$dp[i]$$$ that has been calculated using first $$$(j+1)$$$ indices. Therefore one can note that $$$tmp[i][i - 1] = dp[i]$$$.

Now denote the following function:

$$$k(j, i) = \arg \min\limits_{0 \le x \le j} \left( dp[x] + w(x, i) \right)$$$; ($$$j < i$$$).

In other words, $$$k(j, i)$$$ is an optimum index for relaxing $$$tmp[i][j]$$$; it is the index $$$0 \le x \le j$$$ such that $$$tmp[i][j] = dp[x] + w(x, i)$$$.

Note that $$$k(i - 1, i)$$$ allows us to calculate $$$dp[i]$$$ in $$$O(1)$$$:

$$$dp[i] = dp[k(i - 1, i)] + w(k(i - 1, i), i)$$$.

One can derive $$$k(j + 1, i)$$$ from $$$k(j, i)$$$:

$$$ k(j + 1, i)= \begin{cases} j + 1, &\text{if $$$(j + 1)$$$ is optimum index for $$$tmp[i][j+1]$$$}\\ k(j, i), &\text{otherwise} \end{cases} $$$

That is, to derive $$$k(j + 1, i)$$$ from $$$k(j, i)$$$, one needs to check whether the following condition holds: $$$dp[j + 1] + w(j + 1, i) < dp[k(j, i)] + w(k(j, i), i)$$$.

Statement. For any fixed $$$j$$$ the function $$$k_j(i) = k(j, i)$$$ is non-decreasing for $$$i=(j+1)..n$$$.

Proof

Consequence. If $$$k_{j+1}(i)=j+1$$$ then due to the monotonicity $$$k_{j+1}(i+1) \ge j+1 \Rightarrow k_{j+1}(i+1) = j+1$$$. Recall that $$$k_{j+1}(i)=j+1$$$ or $$$k_{j+1}(i)=k_j(i)$$$. That means that values $$$k_{j+1}$$$ are equal to $$$k_j$$$ excluding some (possibly empty) suffix. On the suffix $$$k_{j+1}=(j+1)$$$.

So we can easily calculate all $$$k(j, i)$$$. $$$k(0, i)$$$ are already known to be $$$0$$$. Suppose we already have calculated $$$dp[x]$$$ and all $$$k(x, i)$$$ for all $$$x \le j$$$. To calculate $$$k(j + 1, i)$$$, one can compute $$$dp[j + 1]$$$ (it can be done in $$$O(1)$$$, because $$$k(j, j + 1)$$$ is already found), find the suffix, for which the values are equal to $$$(j+1)$$$ (it can be found, for example, using binary search, checking whether $$$k(j + 1, mid) = j + 1$$$), "copy" values of $$$k(j, i)$$$ to $$$k(j + 1, i)$$$ and assign $$$(j+1)$$$ on the suffix. The Algorithm section includes description of how to perform this effectively. Note that we have found not only all $$$k(j, i)$$$, but also all $$$dp[j]$$$, and that is exactly what we need.

An example of algorithm process

Algorithm

Let's keep for current $$$j$$$ all the values of $$$k(j, i)$$$ as a deque of equal values segments (equal value segments are represented via their value $$$k(j, i)$$$ and their beginning). To compute $$$dp[j + 1]$$$, one needs to use $$$k(j, j + 1)$$$. It belongs to the first segment in the deque, because $$$(j + 1)$$$ is minimum possible argument for the function $$$k_j$$$. Instead of "copying" just erase the first element of the first segment: increment the beginning of the first segment in the deque. If the segment becomes empty, one should erase it. The next step is to find the suffix, where the values are updated. Note that some segments from the end of the deque would be erased (possibly all or possibly none). The segment is erased iff the function value at the beginning changes to $$$(j + 1)$$$. One should erase the segments from the end of the deque while it is not empty and the condition holds. Then one should find the beginning of the new segment (using binary search) and add it to the deque (the edge cases are empty deque or empty new segment). Some problems allow us to find the beginning of the new segment without binary search, so it improves the complexity. To apply this one has to be able to compute $$$D(j, i)$$$ — minimal index $$$x$$$ such that $$$dp[j] + w(j, x) \ge dp[i] + w(i, x)$$$.

Complexity

The code for reference can be found below or here (however, there is slightly different implementation).

Example

Consider the problem 319C - Kalila and Dimna in the Logging Industry.

It can be reduced to the following dynamic programming computation:

$$$dp[i]=\min\limits_{j<i} dp[j] + b[j] \cdot a[i]$$$,

Note that for all $$$i < j$$$ the monotonicity holds: $$$a[i] < a[j]$$$ and $$$b[i] > b[j]$$$.

Denote the following cost function:

$$$w(j, i) = b[j] \cdot a[i]$$$.

It satisfies quadrangle inequality due to rearrangement inequality, so 1D1D is applicable here.

Here is an example of possible $$$O(n\log{n})$$$ implementation: 125522308.

To create linear solution, one can slightly change the implementation: denote $$$D(j, i)$$$ as minimum number $$$x$$$ (not minimum index) such that $$$dp[j] + b[j] \cdot x > dp[i] + b[i] \cdot x$$$. The deque should keep not the beginning-index but the beginning-number. The last thing is that one should not increment the beginning of the first segment but just erase those segments from the beginning of the deque, which are to the left of current $$$a[j]$$$. The value of $$$D(j, i)$$$ can be easily computed: it is equal to $$$\frac{dp[i] - dp[j]}{b[j] - b[i]}$$$.

Here is an example of possible $$$O(n)$$$ implementation: 125523046.

Sources: 1 2.

Tags dp, 1d1d
  • Vote: I like it
  • +98
  • Vote: I do not like it

»
3 years ago, # |
  Vote: I like it +3 Vote: I do not like it

I've always called this Convex Hull Trick of a function that's not linear :thinking:

»
3 years ago, # |
Rev. 2   Vote: I like it +23 Vote: I do not like it

Great blog!

While I was reading it I believe I caught a typo.

In this part:

$$$k(j, i) = \arg \min\limits_{0 \le x \le j} \left( dp[x] + {\bf w(x, j)} \right)$$$

In other words, $$$k(j,i)$$$ is an optimum index for relaxing $$$tmp[i][j]$$$; it is the index $$$0\leq x\leq j$$$ such that $$$tmp[i][j]=dp[x]+{\bf w(x,j)}$$$.

The two bold $$$w(x,{\bf j})$$$ should be $$$w(x, {\bf i})$$$ if I am not mistaken.