[Tutorial] Monotonic Stack Optimization for "Last-Segment" DP

I recently encountered several problems that looked like standard $$$O(N^2)$$$ Dynamic Programming but required a faster solution. I noticed a recurring theme: using Monotonic Stack to optimize transitions dependent on range statistics (like min/max/GCD). Sometimes these problems can also be solved using the convex hull trick, but the stack-based approach is often much simpler.

However, not all "Monotone Stack DPs" are the same. Sometimes we use the stack to optimize the transition loop, and other times we use it to optimize the input array itself.

In this post, I will explain the intuition behind these two distinct patterns.

Pattern 1: Optimizing the Transition

Target: Problems where the cost function behaves like a "Step Function."

The Problem

We are dealing with a 1D DP recurrence of the form:

$$$dp[i] = \min_{0 \le j \lt i} \{ dp[j] + \text{cost}(j+1 \dots i) \}$$$

Where $$$\text{cost}(j+1 \dots i)$$$ depends on a range statistic (e.g., $$$\min(a_{j+1} \dots a_i)$$$, $$$\text{GCD}$$$, $$$\text{OR}$$$, etc.). $$$N$$$ is usually up to $$$4 \cdot 10^5$$$, so an $$$O(N^2)$$$ loop will TLE. We need something faster.

Intuition 1: Range Statistics are "Step Functions"

Let's look at the problem Inverse Minimum Partition (Easy Version). Cost function: $$$\lceil a_i / \min(a_{j+1} \dots a_i) \rceil$$$.

Fix the current index $$$i$$$. As we move the starting point $$$j$$$ from $$$i-1$$$ down to $$$0$$$, the range $$$[j+1, i]$$$ gets longer. The value $$$\min(a_{j+1} \dots a_i)$$$ does not change at every step. It stays constant for a while, then suddenly drops when we hit a new, smaller element.

This means we don't have $$$i$$$ distinct "costs" to check. We only have distinct costs corresponding to the "Next Smaller Elements" to the left.

Intuition 2: Compressing the Candidates

Since many indices $$$j$$$ share the same range minimum, they share the same denominator in the cost function. For a group of $$$j$$$'s that share the same range minimum, which one is optimal? Obviously, the one with the smallest $$$dp[j]$$$.

So, instead of iterating all $$$j$$$, we only need to store pairs of: { j, min_dp[j] }

Intuition 3: The Dominance Rule (A Pareto frontier)

We are trying to minimize the total cost:

$$$\text{Total} = dp[j] + \lceil a_i / \text{min_val} \rceil$$$

We have two opposing forces:

We want a Large min_val: This makes the fraction smaller (Good).
We want a Small dp[j]: This makes the base cost smaller (Good).

This creates a trade-off. However, some states offer no trade-off; they are strictly worse. If we have a candidate in the stack that has a smaller min_val (worse denominator) AND a larger dp (worse base cost), it loses on both fronts. It is "dominated."

We maintain a stack that represents the Pareto Frontier:

Values are Strictly Increasing: We only keep a new value if it's larger than the last one (better denominator).
DP Costs are Strictly Decreasing: To justify keeping a smaller value (worse denominator), it must come with a much cheaper dp cost.

vector<pair<int, ll>> st; // {val, dp}

// Inside loop i...
ll cur = (i ? dp[i-1] : 0);

// 1. Maintain Monotonicity (Step Function)
while(st.size() && st.back().first >= a[i]) {
    cur = min(cur, st.back().second);
    st.pop_back();
}

// 2. Pop if stack top has worse DP cost for a smaller value
while(st.size() && st.back().second >= cur) {
    st.pop_back();
}

st.push_back({a[i], cur});

// 3. Brute force stack (size ~ sqrt(A))
dp[i] = INF;
for(auto& [val, cost] : st) {
    dp[i] = min(dp[i], cost + (ll)ceil((double)a[i]/val));
}

Crucial Note: Why is the inner loop fast?

You might worry that iterating through the stack for(auto& x : st) inside the main loop makes this $$$O(N^2)$$$. It does not. The complexity is closer to $$$O(N \sqrt{A_{max}})$$$.

The Proof: The size of the stack is naturally limited by the properties of Integer Division. For any integer $$$A$$$, the expression $$$\lceil A / V \rceil$$$ can only take on approximately $$$2\sqrt{A}$$$ distinct values. Because we enforce "Double Monotonicity" (Increasing $$$V$$$, Decreasing $$$DP$$$), the stack effectively only stores candidates that produce meaningful changes in the division result or significant drops in DP cost. Empirically, for $$$A_{max} = 10^5$$$, the stack size rarely exceeds 100-200 elements, making the inner loop effectively constant time.

My Solution (343559614)

Pattern 2: Optimizing the Input (State Space Reduction)

Target: Problems where $$$N$$$ is large, but a constraint $$$K$$$ (capacity/value) is small.

The Problem

Sometimes the bottleneck isn't the transition logic, but the sheer number of elements $$$N$$$. Consider a problem like Wishing Cards.

$$$N \le 10^5$$$
Total Capacity $$$K \le 360$$$
We want to maximize happiness based on cumulative maximums.

A standard DP would be $$$O(N \cdot K^2)$$$, which is too slow. We need to reduce $$$N$$$ so the complexity depends on $$$K$$$.

Intuition 1: The "Useless" State (Dominance)

In many greedy and DP problems, a state becomes "useless" if there is a previous state that is strictly better in every possible future scenario.

Let's say we have an item at index $$$i$$$ (earlier) and an item at index $$$j$$$ (later). If item $$$i$$$ has a better value (e.g., higher capacity) AND is available earlier, it "dominates" item $$$j$$$.

Item $$$i$$$ can do everything $$$j$$$ can do.
Item $$$i$$$ contributes to the answer for a longer duration/range.

Therefore, if $$$i$$$ dominates $$$j$$$, we can delete $$$j$$$ from the array entirely.

Intuition 2: Input Compression

If we filter out all dominated elements using this logic, the remaining array must be strictly monotonic.

If we keep element $$$j$$$ after $$$i$$$, it must be because $$$j$$$ is strictly stronger than $$$i$$$. (Otherwise, $$$i$$$ would have eliminated $$$j$$$).

If the values are integers bounded by $$$K$$$ (e.g., capacities $$$\le K$$$), and our filtered array is strictly increasing, the maximum length of this array is $$$K$$$. We have successfully compressed $$$N=10^5$$$ down to $$$N \le K$$$, allowing an $$$O(K^3)$$$ or $$$O(K^2)$$$ solution.

Implementation: The Filter

We use a stack to perform this filtering in $$$O(N)$$$. We usually iterate backwards so we can compare the "current/earlier" element against the "stack/later" elements.

stack<int> st;
// Iterate backwards: i = Earlier, st.top() = Later
for(int i = n-1; i >= 0; --i) {
    // Pop if Earlier dominates Later (Stronger/Equal capacity)
    while(st.size() && a[i] >= a[st.top()]) st.pop();
    st.push(i);
}

vector<int> b;
while(st.size()) b.push_back(st.top()), st.pop();
// Size of b is <= K. Run DP on b.

My Solution (352213797)

If you know more problems that fit either of these patterns — or if you have seen this technique referred to by a specific name — feel free to share them. I’d love to expand this tutorial with additional examples or terminology.

Rev.	By	When	Δ	Comment
en8	Sersawy	2025-12-08 01:44:26	58
en7	Sersawy	2025-12-08 01:42:50	62
en6	Sersawy	2025-12-08 01:16:43	136
en5	Sersawy	2025-12-07 18:22:04	19	Tiny change: ' Frontier? (Mini-Explanation)\n\nWhen' -> ' Frontier?\n\nWhen'
en4	Sersawy	2025-12-07 15:48:16	1407
en3	Sersawy	2025-12-07 01:08:53	279	Tiny change: '97)\n\n---\n## Extra' -> '97)\n\n-----\n\n\n## Extra'
en2	Sersawy	2025-12-06 20:57:50	2	Tiny change: '(N \cdot K)$, which ' -> '(N \cdot K^2)$, which '
en1	Sersawy	2025-12-06 20:42:52	7355	Initial revision (published)

#	User	Rating
1	Benq	3792
2	VivaciousAubergine	3647
3	Kevin114514	3603
4	jiangly	3583
5	turmax	3559
6	tourist	3541
7	strapple	3515
8	ksun48	3461
9	dXqwq	3436
10	Otomachi_Una	3413

#	User	Contrib.
1	Qingyu	157
2	adamant	153
3	Um_nik	147
4	Proof_by_QED	146
5	Dominater069	145
6	errorgorn	141
7	cry	139
8	YuukiS	135
9	TheScrasse	134
10	chromate00	133

Pattern 1: Optimizing the Transition

The Problem

Intuition 1: Range Statistics are "Step Functions"

Intuition 2: Compressing the Candidates

Intuition 3: The Dominance Rule (A Pareto frontier)

Crucial Note: Why is the inner loop fast?

Pattern 2: Optimizing the Input (State Space Reduction)

The Problem

Intuition 1: The "Useless" State (Dominance)

Intuition 2: Input Compression

Implementation: The Filter

History