I recently encountered several problems that looked like standard $O(N^2)$ Dynamic Programming but required a faster solution. I noticed a recurring theme: using **Monotonic Stack** to optimize transitions dependent on range statistics (like min/max/GCD). Sometimes these problems can also be solved using the convex hull trick, but the stack-based approach is often much simpler.↵
↵
However, not all "Monotone Stack DPs" are the same. Sometimes we use the stack to optimize the **transition loop**, and other times we use it to optimize the **input array itself**.↵
↵
In this post, I will explain the intuition behind these two distinct patterns.↵
↵
-----↵
↵
## Pattern 1: Optimizing the Transition↵
↵
**Target:** Problems where the cost function behaves like a "Step Function."↵
↵
### The Problem↵
↵
We are dealing with a 1D DP recurrence of the form:↵
↵
$$dp[i] = \min_{0 \le j < i} \{ dp[j] + \text{cost}(j+1 \dots i) \}$$↵
↵
Where $\text{cost}(j+1 \dots i)$ depends on a range statistic (e.g., $\min(a_{j+1} \dots a_i)$, $\text{GCD}$, $\text{OR}$, etc.).↵
$N$ is usually up to $4 \cdot 10^5$, so an $O(N^2)$ loop will TLE. We need something faster.↵
↵
### Intuition 1: Range Statistics are "Step Functions"↵
↵
Let's look at the problem **[Inverse Minimum Partition (Easy Version)](https://mirror.codeforces.com/contest/2160/problem/G1)**.↵
**Cost function:** $\lceil a_i / \min(a_{j+1} \dots a_i) \rceil$.↵
↵
Fix the current index $i$. As we move the starting point $j$ from $i-1$ down to $0$, the range $[j+1, i]$ gets longer.↵
The value $\min(a_{j+1} \dots a_i)$ does **not** change at every step. It stays constant for a while, then suddenly drops when we hit a new, smaller element.↵
↵
This means we don't have $i$ distinct "costs" to check. We only have distinct costs corresponding to the "Next Smaller Elements" to the left.↵
↵
### Intuition 2: Compressing the Candidates↵
↵
Since many indices $j$ share the same range minimum, they share the same denominator in the cost function.↵
For a group of $j$'s that share the same range minimum, which one is optimal?↵
Obviously, the one with the **smallest** $dp[j]$.↵
↵
So, instead of iterating all $j$, we only need to store pairs of:↵
`{ j, min_dp[j] }`↵
↵
### Intuition 3: The Dominance Rule (A Pareto frontier)↵
↵
We are trying to minimize the total cost:↵
$$\text{Total} = dp[j] + \lceil a_i / \text{min_val} \rceil$$↵
↵
We have two opposing forces:↵
↵
1. **We want a Large `min_val`**: This makes the fraction smaller (Good).↵
2. **We want a Small `dp[j]`**: This makes the base cost smaller (Good).↵
↵
This creates a trade-off. However, some states offer **no trade-off**; they are strictly worse.↵
If we have a candidate in the stack that has a **smaller** `min_val` (worse denominator) AND a **larger** `dp` (worse base cost), it loses on both fronts. It is "dominated."↵
↵
We maintain a stack that represents the **Pareto Frontier**:↵
↵
* **Values are Strictly Increasing:** We only keep a new value if it's larger than the last one (better denominator).↵
* **DP Costs are Strictly Decreasing:** To justify keeping a smaller value (worse denominator), it *must* come with a much cheaper `dp` cost.↵
↵
```cpp↵
vector<pair<int, ll>> st; // {val, dp}↵
↵
// Inside loop i...↵
ll cur = (i ? dp[i-1] : 0);↵
↵
// 1. Maintain Monotonicity (Step Function)↵
while(st.size() && st.back().first >= a[i]) {↵
cur = min(cur, st.back().second);↵
st.pop_back();↵
}↵
↵
// 2. Pop if stack top has worse DP cost for a smaller value↵
while(st.size() && st.back().second >= cur) {↵
st.pop_back();↵
}↵
↵
st.push_back({a[i], cur});↵
↵
// 3. Brute force stack (size ~ sqrt(A))↵
dp[i] = INF;↵
for(auto& [val, cost] : st) {↵
dp[i] = min(dp[i], cost + (ll)ceil((double)a[i]/val));↵
}↵
```↵
↵
↵
### Crucial Note: Why is the inner loop fast?↵
↵
You might worry that iterating through the stack `for(auto& x : st)` inside the main loop makes this $O(N^2)$. It does not. The complexity is closer to $O(N \sqrt{A_{max}})$.↵
↵
**The Proof:**↵
The size of the stack is naturally limited by the properties of **Integer Division**.↵
For any integer $A$, the expression $\lceil A / V \rceil$ can only take on approximately $2\sqrt{A}$ distinct values.↵
Because we enforce "Double Monotonicity" (Increasing $V$, Decreasing $DP$), the stack effectively only stores candidates that produce meaningful changes in the division result or significant drops in DP cost.↵
Empirically, for $A_{max} = 10^5$, the stack size rarely exceeds 100-200 elements, making the inner loop effectively constant time.↵
↵
↵
[My Solution (343559614)](https://mirror.codeforces.com/contest/2160/submission/343559614)↵
↵
-----↵
↵
## Pattern 2: Optimizing the Input (State Space Reduction)↵
↵
**Target:** Problems where $N$ is large, but a constraint $K$ (capacity/value) is small.↵
↵
### The Problem↵
↵
Sometimes the bottleneck isn't the transition logic, but the sheer number of elements $N$.↵
Consider a problem like **[Wishing Cards](https://mirror.codeforces.com/contest/2174/problem/B)**.↵
↵
* $N \le 10^5$↵
* Total Capacity $K \le 360$↵
* We want to maximize happiness based on cumulative maximums.↵
↵
A standard DP would be $O(N \cdot K^2)$, which is too slow. We need to reduce $N$ so the complexity depends on $K$.↵
↵
### Intuition 1: The "Useless" State (Dominance)↵
↵
In many greedy and DP problems, a state becomes "useless" if there is a **previous** state that is strictly better in every possible future scenario.↵
↵
Let's say we have an item at index $i$ (earlier) and an item at index $j$ (later).↵
If item $i$ has a **better value** (e.g., higher capacity) AND is **available earlier**, it "dominates" item $j$.↵
↵
* Item $i$ can do everything $j$ can do.↵
* Item $i$ contributes to the answer for a longer duration/range.↵
↵
Therefore, if $i$ dominates $j$, we can delete $j$ from the array entirely.↵
↵
### Intuition 2: Input Compression↵
↵
If we filter out all dominated elements using this logic, the remaining array must be **strictly monotonic**.↵
↵
* If we keep element $j$ after $i$, it *must* be because $j$ is strictly stronger than $i$. (Otherwise, $i$ would have eliminated $j$).↵
↵
If the values are integers bounded by $K$ (e.g., capacities $\le K$), and our filtered array is strictly increasing, the maximum length of this array is $K$.↵
We have successfully compressed $N=10^5$ down to $N \le K$, allowing an $O(K^3)$ or $O(K^2)$ solution.↵
↵
### Implementation: The Filter↵
↵
We use a stack to perform this filtering in $O(N)$. We usually iterate **backwards** so we can compare the "current/earlier" element against the "stack/later" elements.↵
↵
```cpp↵
stack<int> st;↵
// Iterate backwards: i = Earlier, st.top() = Later↵
for(int i = n-1; i >= 0; --i) {↵
// Pop if Earlier dominates Later (Stronger/Equal capacity)↵
while(st.size() && a[i] >= a[st.top()]) st.pop();↵
st.push(i);↵
}↵
↵
vector<int> b;↵
while(st.size()) b.push_back(st.top()), st.pop();↵
// Size of b is <= K. Run DP on b.↵
```↵
↵
[My Solution (352213797)](https://mirror.codeforces.com/contest/2174/submission/352213797)↵
↵
-----↵
↵
↵
## Extra Problems:↵
↵
1. [CGCDSSQ](https://mirror.codeforces.com/contest/475/problem/D) -- [Idea](https://mirror.codeforces.com/blog/entry/149014#comment-1330975)↵
↵
2. [Discharging](https://qoj.ac/problem/13914) -- [Idea](https://mirror.codeforces.com/blog/entry/149014#comment-1330982)↵
---↵
↵
If you know more problems that fit either of these patterns — or if you have seen this technique referred to by a specific name — feel free to share them. I’d love to expand this tutorial with additional examples or terminology.
↵
However, not all "Monotone Stack DPs" are the same. Sometimes we use the stack to optimize the **transition loop**, and other times we use it to optimize the **input array itself**.↵
↵
In this post, I will explain the intuition behind these two distinct patterns.↵
↵
-----↵
↵
## Pattern 1: Optimizing the Transition↵
↵
**Target:** Problems where the cost function behaves like a "Step Function."↵
↵
### The Problem↵
↵
We are dealing with a 1D DP recurrence of the form:↵
↵
$$dp[i] = \min_{0 \le j < i} \{ dp[j] + \text{cost}(j+1 \dots i) \}$$↵
↵
Where $\text{cost}(j+1 \dots i)$ depends on a range statistic (e.g., $\min(a_{j+1} \dots a_i)$, $\text{GCD}$, $\text{OR}$, etc.).↵
$N$ is usually up to $4 \cdot 10^5$, so an $O(N^2)$ loop will TLE. We need something faster.↵
↵
### Intuition 1: Range Statistics are "Step Functions"↵
↵
Let's look at the problem **[Inverse Minimum Partition (Easy Version)](https://mirror.codeforces.com/contest/2160/problem/G1)**.↵
**Cost function:** $\lceil a_i / \min(a_{j+1} \dots a_i) \rceil$.↵
↵
Fix the current index $i$. As we move the starting point $j$ from $i-1$ down to $0$, the range $[j+1, i]$ gets longer.↵
The value $\min(a_{j+1} \dots a_i)$ does **not** change at every step. It stays constant for a while, then suddenly drops when we hit a new, smaller element.↵
↵
This means we don't have $i$ distinct "costs" to check. We only have distinct costs corresponding to the "Next Smaller Elements" to the left.↵
↵
### Intuition 2: Compressing the Candidates↵
↵
Since many indices $j$ share the same range minimum, they share the same denominator in the cost function.↵
For a group of $j$'s that share the same range minimum, which one is optimal?↵
Obviously, the one with the **smallest** $dp[j]$.↵
↵
So, instead of iterating all $j$, we only need to store pairs of:↵
`{ j, min_dp[j] }`↵
↵
### Intuition 3: The Dominance Rule (A Pareto frontier)↵
↵
We are trying to minimize the total cost:↵
$$\text{Total} = dp[j] + \lceil a_i / \text{min_val} \rceil$$↵
↵
We have two opposing forces:↵
↵
1. **We want a Large `min_val`**: This makes the fraction smaller (Good).↵
2. **We want a Small `dp[j]`**: This makes the base cost smaller (Good).↵
↵
This creates a trade-off. However, some states offer **no trade-off**; they are strictly worse.↵
If we have a candidate in the stack that has a **smaller** `min_val` (worse denominator) AND a **larger** `dp` (worse base cost), it loses on both fronts. It is "dominated."↵
↵
We maintain a stack that represents the **Pareto Frontier**:↵
↵
* **Values are Strictly Increasing:** We only keep a new value if it's larger than the last one (better denominator).↵
* **DP Costs are Strictly Decreasing:** To justify keeping a smaller value (worse denominator), it *must* come with a much cheaper `dp` cost.↵
↵
```cpp↵
vector<pair<int, ll>> st; // {val, dp}↵
↵
// Inside loop i...↵
ll cur = (i ? dp[i-1] : 0);↵
↵
// 1. Maintain Monotonicity (Step Function)↵
while(st.size() && st.back().first >= a[i]) {↵
cur = min(cur, st.back().second);↵
st.pop_back();↵
}↵
↵
// 2. Pop if stack top has worse DP cost for a smaller value↵
while(st.size() && st.back().second >= cur) {↵
st.pop_back();↵
}↵
↵
st.push_back({a[i], cur});↵
↵
// 3. Brute force stack (size ~ sqrt(A))↵
dp[i] = INF;↵
for(auto& [val, cost] : st) {↵
dp[i] = min(dp[i], cost + (ll)ceil((double)a[i]/val));↵
}↵
```↵
↵
↵
### Crucial Note: Why is the inner loop fast?↵
↵
You might worry that iterating through the stack `for(auto& x : st)` inside the main loop makes this $O(N^2)$. It does not. The complexity is closer to $O(N \sqrt{A_{max}})$.↵
↵
**The Proof:**↵
The size of the stack is naturally limited by the properties of **Integer Division**.↵
For any integer $A$, the expression $\lceil A / V \rceil$ can only take on approximately $2\sqrt{A}$ distinct values.↵
Because we enforce "Double Monotonicity" (Increasing $V$, Decreasing $DP$), the stack effectively only stores candidates that produce meaningful changes in the division result or significant drops in DP cost.↵
Empirically, for $A_{max} = 10^5$, the stack size rarely exceeds 100-200 elements, making the inner loop effectively constant time.↵
↵
↵
[My Solution (343559614)](https://mirror.codeforces.com/contest/2160/submission/343559614)↵
↵
-----↵
↵
## Pattern 2: Optimizing the Input (State Space Reduction)↵
↵
**Target:** Problems where $N$ is large, but a constraint $K$ (capacity/value) is small.↵
↵
### The Problem↵
↵
Sometimes the bottleneck isn't the transition logic, but the sheer number of elements $N$.↵
Consider a problem like **[Wishing Cards](https://mirror.codeforces.com/contest/2174/problem/B)**.↵
↵
* $N \le 10^5$↵
* Total Capacity $K \le 360$↵
* We want to maximize happiness based on cumulative maximums.↵
↵
A standard DP would be $O(N \cdot K^2)$, which is too slow. We need to reduce $N$ so the complexity depends on $K$.↵
↵
### Intuition 1: The "Useless" State (Dominance)↵
↵
In many greedy and DP problems, a state becomes "useless" if there is a **previous** state that is strictly better in every possible future scenario.↵
↵
Let's say we have an item at index $i$ (earlier) and an item at index $j$ (later).↵
If item $i$ has a **better value** (e.g., higher capacity) AND is **available earlier**, it "dominates" item $j$.↵
↵
* Item $i$ can do everything $j$ can do.↵
* Item $i$ contributes to the answer for a longer duration/range.↵
↵
Therefore, if $i$ dominates $j$, we can delete $j$ from the array entirely.↵
↵
### Intuition 2: Input Compression↵
↵
If we filter out all dominated elements using this logic, the remaining array must be **strictly monotonic**.↵
↵
* If we keep element $j$ after $i$, it *must* be because $j$ is strictly stronger than $i$. (Otherwise, $i$ would have eliminated $j$).↵
↵
If the values are integers bounded by $K$ (e.g., capacities $\le K$), and our filtered array is strictly increasing, the maximum length of this array is $K$.↵
We have successfully compressed $N=10^5$ down to $N \le K$, allowing an $O(K^3)$ or $O(K^2)$ solution.↵
↵
### Implementation: The Filter↵
↵
We use a stack to perform this filtering in $O(N)$. We usually iterate **backwards** so we can compare the "current/earlier" element against the "stack/later" elements.↵
↵
```cpp↵
stack<int> st;↵
// Iterate backwards: i = Earlier, st.top() = Later↵
for(int i = n-1; i >= 0; --i) {↵
// Pop if Earlier dominates Later (Stronger/Equal capacity)↵
while(st.size() && a[i] >= a[st.top()]) st.pop();↵
st.push(i);↵
}↵
↵
vector<int> b;↵
while(st.size()) b.push_back(st.top()), st.pop();↵
// Size of b is <= K. Run DP on b.↵
```↵
↵
[My Solution (352213797)](https://mirror.codeforces.com/contest/2174/submission/352213797)↵
↵
-----↵
↵
↵
## Extra Problems:↵
↵
1. [CGCDSSQ](https://mirror.codeforces.com/contest/475/problem/D) -- [Idea](https://mirror.codeforces.com/blog/entry/149014#comment-1330975)↵
↵
2. [Discharging](https://qoj.ac/problem/13914) -- [Idea](https://mirror.codeforces.com/blog/entry/149014#comment-1330982)↵
---↵
↵
If you know more problems that fit either of these patterns — or if you have seen this technique referred to by a specific name — feel free to share them. I’d love to expand this tutorial with additional examples or terminology.



