Blog entries - Codeforces

I recently encountered several problems that looked like standard $$$O(N^2)$$$ Dynamic Programming but required a faster solution. I noticed a recurring theme: using Monotonic Stack to optimize transitions dependent on range statistics (like min/max/GCD). Sometimes these problems can also be solved using the convex hull trick, but the stack-based approach is often much simpler.

However, not all "Monotone Stack DPs" are the same. Sometimes we use the stack to optimize the transition loop, and other times we use it to optimize the input array itself.

In this post, I will explain the intuition behind these two distinct patterns.

Pattern 1: Optimizing the Transition

Target: Problems where the cost function behaves like a "Step Function."

The Problem

We are dealing with a 1D DP recurrence of the form:

$$$dp[i] = \min_{0 \le j \lt i} \{ dp[j] + \text{cost}(j+1 \dots i) \}$$$

Where $$$\text{cost}(j+1 \dots i)$$$ depends on a range statistic (e.g., $$$\min(a_{j+1} \dots a_i)$$$, $$$\text{GCD}$$$, $$$\text{OR}$$$, etc.). $$$N$$$ is usually up to $$$4 \cdot 10^5$$$, so an $$$O(N^2)$$$ loop will TLE. We need something faster.

Intuition 1: Range Statistics are "Step Functions"

Let's look at the problem Inverse Minimum Partition (Easy Version). Cost function: $$$\lceil a_i / \min(a_{j+1} \dots a_i) \rceil$$$.

Fix the current index $$$i$$$. As we move the starting point $$$j$$$ from $$$i-1$$$ down to $$$0$$$, the range $$$[j+1, i]$$$ gets longer. The value $$$\min(a_{j+1} \dots a_i)$$$ does not change at every step. It stays constant for a while, then suddenly drops when we hit a new, smaller element.

This means we don't have $$$i$$$ distinct "costs" to check. We only have distinct costs corresponding to the "Next Smaller Elements" to the left.

Intuition 2: Compressing the Candidates

Since many indices $$$j$$$ share the same range minimum, they share the same denominator in the cost function. For a group of $$$j$$$'s that share the same range minimum, which one is optimal? Obviously, the one with the smallest $$$dp[j]$$$.

So, instead of iterating all $$$j$$$, we only need to store pairs of: { j, min_dp[j] }

Intuition 3: The Dominance Rule (A Pareto Frontier)

Before we apply the pruning rule, we need a small idea from multi-objective optimization:

What is a Pareto Frontier?

When you try to optimize two things at once, improving one usually makes the other worse. Here we want simultaneously:

small dp[j]
large min_val

But not every candidate is meaningful. If candidate A has:

a larger dp[j] (worse), and
a smaller min_val (also worse)

compared to candidate B, then A is completely useless. B dominates A.

The Pareto frontier is simply the set of candidates that are not dominated by any others— i.e., the ones that have a meaningful trade-off between dp[j] and min_val.

(Visual intuition:)

We are trying to minimize the total cost:

$$$ \text{Total} = dp[j] + \left\lceil \frac{a_i}{\text{min_val}} \right\rceil $$$

We have two opposing forces:

We want a Large min_val – smaller fraction → good
We want a Small dp[j] – smaller base cost → good

This creates a trade-off. However, some states offer no trade-off; they are strictly worse.

If we have a candidate in the stack that has a smaller min_val (worse denominator) and a larger dp (worse base cost), it loses on both fronts. It is dominated.

We maintain a stack that represents the Pareto Frontier:

Range minimums are strictly decreasing: We only keep a new value if it is smaller than the last one. (Otherwise the previous minimum becomes useless and gets merged.)
DP costs are strictly increasing: A worse minimum (smaller val) can stay only if it comes with a better (smaller) dp. If its DP is not better, it is dominated and removed.

vector<pair<int, ll>> st; // {val, dp}

// Inside loop i...
ll cur = (i ? dp[i-1] : 0);

// 1. Maintain decreasing min_val sequence (Step Function)
while (st.size() && st.back().first >= a[i]) {
    cur = min(cur, st.back().second);
    st.pop_back();
}

// 2. Remove dominated states (worse dp for a worse min_val)
while (st.size() && st.back().second >= cur) {
    st.pop_back();
}

st.push_back({a[i], cur});

// 3. Brute force stack (size ~ sqrt(A))
dp[i] = INF;
for(auto& [val, cost] : st) {
    dp[i] = min(dp[i], cost + (ll)ceil((double)a[i]/val));
}

Crucial Note: Why is the inner loop fast?

You might worry that iterating through the stack for(auto& x : st) inside the main loop makes this $$$O(N^2)$$$. It does not. The complexity is closer to $$$O(N \sqrt{A_{max}})$$$.

The Proof: The size of the stack is naturally limited by the properties of Integer Division. For any integer $$$A$$$, the expression $$$\lceil A / V \rceil$$$ can only take on approximately $$$2\sqrt{A}$$$ distinct values. Because we enforce "Double Monotonicity" (Increasing $$$V$$$, Decreasing $$$DP$$$), the stack effectively only stores candidates that produce meaningful changes in the division result or significant drops in DP cost. Empirically, for $$$A_{max} = 10^5$$$, the stack size rarely exceeds 100-200 elements, making the inner loop effectively constant time.

My Solution (343559614)

Pattern 2: Optimizing the Input (State Space Reduction)

Target: Problems where $$$N$$$ is large, but a constraint $$$K$$$ (capacity/value) is small.

The Problem

Sometimes the bottleneck isn't the transition logic, but the sheer number of elements $$$N$$$. Consider a problem like Wishing Cards.

$$$N \le 10^5$$$
Total Capacity $$$K \le 360$$$
We want to maximize happiness based on cumulative maximums.

A standard DP would be $$$O(N \cdot K^2)$$$, which is too slow. We need to reduce $$$N$$$ so the complexity depends on $$$K$$$.

Intuition 1: The "Useless" State (Dominance)

In many greedy and DP problems, a state becomes "useless" if there is a previous state that is strictly better in every possible future scenario.

Let's say we have an item at index $$$i$$$ (earlier) and an item at index $$$j$$$ (later). If item $$$i$$$ has a better value (e.g., higher capacity) AND is available earlier, it "dominates" item $$$j$$$.

Item $$$i$$$ can do everything $$$j$$$ can do.
Item $$$i$$$ contributes to the answer for a longer duration/range.

Therefore, if $$$i$$$ dominates $$$j$$$, we can delete $$$j$$$ from the array entirely.

Intuition 2: Input Compression

If we filter out all dominated elements using this logic, the remaining array must be strictly monotonic.

If we keep element $$$j$$$ after $$$i$$$, it must be because $$$j$$$ is strictly stronger than $$$i$$$. (Otherwise, $$$i$$$ would have eliminated $$$j$$$).

If the values are integers bounded by $$$K$$$ (e.g., capacities $$$\le K$$$), and our filtered array is strictly increasing, the maximum length of this array is $$$K$$$. We have successfully compressed $$$N=10^5$$$ down to $$$N \le K$$$, allowing an $$$O(K^3)$$$ or $$$O(K^2)$$$ solution.

Implementation: The Filter

We use a stack to perform this filtering in $$$O(N)$$$. We usually iterate backwards so we can compare the "current/earlier" element against the "stack/later" elements.

stack<int> st;
// Iterate backwards: i = Earlier, st.top() = Later
for(int i = n-1; i >= 0; --i) {
    // Pop if Earlier dominates Later (Stronger/Equal capacity)
    while(st.size() && a[i] >= a[st.top()]) st.pop();
    st.push(i);
}

vector<int> b;
while(st.size()) b.push_back(st.top()), st.pop();
// Size of b is <= K. Run DP on b.

My Solution (352213797)

Extra Problems:

If you know more problems that fit either of these patterns — or if you have seen this technique referred to by a specific name — feel free to share them. I’d love to expand this tutorial with additional examples or terminology.

Full text and comments »