I recently encountered several problems that looked like standard $$$O(N^2)$$$ Dynamic Programming but required a faster solution. I noticed a recurring theme: using Monotonic Stack to optimize transitions dependent on range statistics (like min/max/GCD). Sometimes these problems can also be solved using the convex hull trick, but the stack-based approach is often much simpler.
However, not all "Monotone Stack DPs" are the same. Sometimes we use the stack to optimize the transition loop, and other times we use it to optimize the input array itself.
In this post, I will explain the intuition behind these two distinct patterns.
Pattern 1: Optimizing the Transition
Target: Problems where the cost function behaves like a "Step Function."
The Problem
We are dealing with a 1D DP recurrence of the form:
Where $$$\text{cost}(j+1 \dots i)$$$ depends on a range statistic (e.g., $$$\min(a_{j+1} \dots a_i)$$$, $$$\text{GCD}$$$, $$$\text{OR}$$$, etc.). $$$N$$$ is usually up to $$$4 \cdot 10^5$$$, so an $$$O(N^2)$$$ loop will TLE. We need something faster.
Intuition 1: Range Statistics are "Step Functions"
Let's look at the problem Inverse Minimum Partition (Easy Version). Cost function: $$$\lceil a_i / \min(a_{j+1} \dots a_i) \rceil$$$.
Fix the current index $$$i$$$. As we move the starting point $$$j$$$ from $$$i-1$$$ down to $$$0$$$, the range $$$[j+1, i]$$$ gets longer. The value $$$\min(a_{j+1} \dots a_i)$$$ does not change at every step. It stays constant for a while, then suddenly drops when we hit a new, smaller element.
This means we don't have $$$i$$$ distinct "costs" to check. We only have distinct costs corresponding to the "Next Smaller Elements" to the left.
Intuition 2: Compressing the Candidates
Since many indices $$$j$$$ share the same range minimum, they share the same denominator in the cost function. For a group of $$$j$$$'s that share the same range minimum, which one is optimal? Obviously, the one with the smallest $$$dp[j]$$$.
So, instead of iterating all $$$j$$$, we only need to store pairs of: { j, min_dp[j] }
Intuition 3: The Dominance Rule (A Pareto frontier)
We are trying to minimize the total cost:
We have two opposing forces:
- We want a Large
min_val: This makes the fraction smaller (Good). - We want a Small
dp[j]: This makes the base cost smaller (Good).
This creates a trade-off. However, some states offer no trade-off; they are strictly worse. If we have a candidate in the stack that has a smaller min_val (worse denominator) AND a larger dp (worse base cost), it loses on both fronts. It is "dominated."
We maintain a stack that represents the Pareto Frontier:
- Values are Strictly Increasing: We only keep a new value if it's larger than the last one (better denominator).
- DP Costs are Strictly Decreasing: To justify keeping a smaller value (worse denominator), it must come with a much cheaper
dpcost.
vector<pair<int, ll>> st; // {val, dp}
// Inside loop i...
ll cur = (i ? dp[i-1] : 0);
// 1. Maintain Monotonicity (Step Function)
while(st.size() && st.back().first >= a[i]) {
cur = min(cur, st.back().second);
st.pop_back();
}
// 2. Pop if stack top has worse DP cost for a smaller value
while(st.size() && st.back().second >= cur) {
st.pop_back();
}
st.push_back({a[i], cur});
// 3. Brute force stack (size ~ sqrt(A))
dp[i] = INF;
for(auto& [val, cost] : st) {
dp[i] = min(dp[i], cost + (ll)ceil((double)a[i]/val));
}
Crucial Note: Why is the inner loop fast?
You might worry that iterating through the stack for(auto& x : st) inside the main loop makes this $$$O(N^2)$$$. It does not. The complexity is closer to $$$O(N \sqrt{A_{max}})$$$.
The Proof: The size of the stack is naturally limited by the properties of Integer Division. For any integer $$$A$$$, the expression $$$\lceil A / V \rceil$$$ can only take on approximately $$$2\sqrt{A}$$$ distinct values. Because we enforce "Double Monotonicity" (Increasing $$$V$$$, Decreasing $$$DP$$$), the stack effectively only stores candidates that produce meaningful changes in the division result or significant drops in DP cost. Empirically, for $$$A_{max} = 10^5$$$, the stack size rarely exceeds 100-200 elements, making the inner loop effectively constant time.
Pattern 2: Optimizing the Input (State Space Reduction)
Target: Problems where $$$N$$$ is large, but a constraint $$$K$$$ (capacity/value) is small.
The Problem
Sometimes the bottleneck isn't the transition logic, but the sheer number of elements $$$N$$$. Consider a problem like Wishing Cards.
- $$$N \le 10^5$$$
- Total Capacity $$$K \le 360$$$
- We want to maximize happiness based on cumulative maximums.
A standard DP would be $$$O(N \cdot K^2)$$$, which is too slow. We need to reduce $$$N$$$ so the complexity depends on $$$K$$$.
Intuition 1: The "Useless" State (Dominance)
In many greedy and DP problems, a state becomes "useless" if there is a previous state that is strictly better in every possible future scenario.
Let's say we have an item at index $$$i$$$ (earlier) and an item at index $$$j$$$ (later). If item $$$i$$$ has a better value (e.g., higher capacity) AND is available earlier, it "dominates" item $$$j$$$.
- Item $$$i$$$ can do everything $$$j$$$ can do.
- Item $$$i$$$ contributes to the answer for a longer duration/range.
Therefore, if $$$i$$$ dominates $$$j$$$, we can delete $$$j$$$ from the array entirely.
Intuition 2: Input Compression
If we filter out all dominated elements using this logic, the remaining array must be strictly monotonic.
- If we keep element $$$j$$$ after $$$i$$$, it must be because $$$j$$$ is strictly stronger than $$$i$$$. (Otherwise, $$$i$$$ would have eliminated $$$j$$$).
If the values are integers bounded by $$$K$$$ (e.g., capacities $$$\le K$$$), and our filtered array is strictly increasing, the maximum length of this array is $$$K$$$. We have successfully compressed $$$N=10^5$$$ down to $$$N \le K$$$, allowing an $$$O(K^3)$$$ or $$$O(K^2)$$$ solution.
Implementation: The Filter
We use a stack to perform this filtering in $$$O(N)$$$. We usually iterate backwards so we can compare the "current/earlier" element against the "stack/later" elements.
stack<int> st;
// Iterate backwards: i = Earlier, st.top() = Later
for(int i = n-1; i >= 0; --i) {
// Pop if Earlier dominates Later (Stronger/Equal capacity)
while(st.size() && a[i] >= a[st.top()]) st.pop();
st.push(i);
}
vector<int> b;
while(st.size()) b.push_back(st.top()), st.pop();
// Size of b is <= K. Run DP on b.
If you know more problems that fit either of these patterns — or if you have seen this technique referred to by a specific name — feel free to share them. I’d love to expand this tutorial with additional examples or terminology.



