Digit DP "tricks" - Codeforces

Cut the flag dimension

Usually, whatever states you use in the recursive dp function, you will memoize it. And often you will have some thing like this

int memo[pos][...][...][low]

Where low is the flag that checks if the current number is already smaller than the considered number.

It is totally possible to subtract this dimension (half the memory needed) by manipulating it in the recursive function:

Example problem: Perfect Number

This is what "normal" code would look like:

normal code

ll mem[20][11][2];
 
ll dp(int pos, int sum, bool lo) {
    if(sum > 10) return 0;
    if(pos == n) return (sum == 10);
    
    ll& res = mem[pos][sum][lo];
    if(res != -1) return res;
    
    res = 0;
    
    int mx = lo ? 9 : v[pos];
    for(int d = 0; d <= mx; d++)
        res += dp(pos + 1, sum + d, lo || (d < mx));
    
    return res;
}

And this is the optimized code:

optimized code

ll mem[20][11];

ll dp(int pos, int sum, bool lo) {
    if(sum > 10) return 0;
    if(pos == n) return (sum == 10);
    
    ll& res = mem[pos][sum];
    if(lo && res != -1) return res;
    
    ll ans = 0;
    int mx = lo ? 9 : v[pos];
    for(int d = 0; d <= mx; d++)
        ans += dp(pos + 1, sum + d, lo || (d < mx));
    
    return lo ? res = ans : ans;
}

Basically this trick only store the number if the low flag is on, since low isn't necessary to be memoized because its only meaning is to set the limit for the current digit.

Full submission for the "normal" code: "normal" code __ Full submission for the optimized code: optimized code

Different ways to memset

"Normal" memset

You memset every time you dp. This would takes a huge amount of time if you have to call dp many time or the memory is large. Example problem: LIDS

Example slow code:

slow code

#include <bits/stdc++.h>
using namespace std;
#define int long long

// the maximum length of LIDS is 10
// so we can check for each length k,
// in how many ways can make a number with LIDS = k

// then we can print the result we found for the maximum k

int a, b;
vector<int> v;

int mem[11][11][2][2][11];

int dp(int pos, int last, bool small, bool nonzero, int need) {
    if(pos == (int)v.size())
        return need == 0;

    if(mem[pos][last + 1][small][nonzero][need] != -1)
        return mem[pos][last + 1][small][nonzero][need];

    int res = 0; // res is the result for dp(pos, last + 1, small, nonzero, need)
    int mx = small ? 9 : v[pos];
    for(int d = 0; d <= mx; d++) {
        res += dp(pos + 1, last, (d < mx) || small, nonzero || d, need);
        if(d > last && need && (nonzero || d))
            res += dp(pos + 1, d, (d < mx) || small, 1, need - 1);
    }

    return mem[pos][last + 1][small][nonzero][need] = res;
}

void convert(int x) {
    // convert into array of digit
    v.clear();
    while(x) {
        v.push_back(x % 10);
        x /= 10;
    }
    reverse(v.begin(), v.end());
}

pair<int, int> solve(int st, int en) {
    vector<int> lids(10);

    convert(en);
    for(int i = 1; i < 10; i++) {
        memset(mem, -1, sizeof mem);
        lids[i] += dp(0, -1, 0, 0, i);
    }

    convert(st - 1);
    for(int i = 1; i < 10; i++) {
        memset(mem, -1, sizeof mem);
        lids[i] -= dp(0, -1, 0, 0, i);
    }

    for(int i = 10 - 1; i >= 1; i--)
        if(lids[i]) return {i, lids[i]};

    return {0, 1};
}

signed main() {
	ios::sync_with_stdio(0);
	cin.tie(0);

    int T;
    cin >> T;

    for(int tc = 1; tc <= T; tc++) {
        cin >> a >> b;
        pair<int, int> res = solve(a, b);
        cout << "Case " << tc << ": " << res.first << ' ' << res.second;
        if(tc != T) cout << "\n";
    }
}

You can clearly see that memset is executed many times for all digits, this would have complexity $$$\mathcal{O}(T * digits * memsize)$$$ where $$$T$$$ is the number of testcases, and $$$digits$$$ is the number of digits (from 0 to 9 in this case), and $$$memsize$$$ is the size of memory.

This is extremely slow.

Improvement using "time"

Now instead of memset every time you dp, you can keep an additional array vis[pos][...][...] which will store the "time" that the value in mem[pos][...][...] is set.

code

...
    if(vis[pos][last + 1][small][nonzero][need] == cur)
        return mem[pos][last + 1][small][nonzero][need];
    vis[pos][last + 1][small][nonzero][need] = cur;
...

This way, the complexity is better but this is still too slow for many problems.

memset only once

You might wonder, "but how? you are doing dp many times on many different numbers!". Well actually, we are doing dp on the digits.

You might notice that we're always doing dp from the most significant digit to the least, usually from left to right, the most significant digit will be at position $$$0$$$ and the least at position $$$length - 1$$$.

This way, the memory for each number is different, like number $$$100$$$ will have different memory from number $$$1234$$$ since they have different $$$length$$$ and other states.

However, what if we let the most significant digit to be at position $$$length - 1$$$ and the least at position $$$0$$$?

Now, every digits of every number line up, and you only need to memset once only.

Example solution of: Perfect Number

code

int dp(int pos, int sum, bool lo) {
    if(sum > 10) return 0;
    if(pos == -1) return (sum == 10);
    
    int& res = mem[pos][sum];
    if(lo && res != -1) return res;
    
    int ans = 0;
    int mx = lo ? 9 : v[pos];
    for(int d = 0; d <= mx; d++)
        ans += dp(pos - 1, sum + d, lo || (d < mx));
    
    return lo ? res = ans : ans;
}
 
int solve(int x) {
    v.clear();
    while(x) {
        v.push_back(x % 10);
        x /= 10;
    }
    // notice that I don't reverse the number anymore
    
    // start from length - 1
    return dp((int)v.size() - 1, 0, 0);
}

int main() {
    ios::sync_with_stdio(0); cin.tie(0); cout.tie(0);
    
    int k;
    cin >> k;
    
    int l = 1;
    int r = 2e7;
    
    memset(mem, -1, sizeof mem);
    while(l < r - 1) {
...

This is an extremely important optimization for digit dp

Other optimizations problem-wise

Check sum of digits divisibility

For a single number

If you want to check if the sum of digits of a number is divisible by $$$D$$$. Instead of storing the whole sum(could lead to MLE), you can store only the remainder of the sum when divided by $$$D$$$.

For many numbers

Example problem: WORKCHEF (highly recommended, you will need to use a lot of optimizations to AC)

For many numbers, instead of having a state for the remainder for each number, eg: dp[...][rem2][rem3][...] you can store the remainder of their LCM, eg: checking sum of digits divisible by 1, 2, 3, ... , 9 -> check divisibility by $$$LCM(1, 2, ..., 9) = 2520$$$.

For numbers with special properties

If you want to check divisibility by 5, the last digit need to be 0 or 5. For 10, the last digit obviously must be 0. ... There are also many properties for different numbers.

Another way of digit dp

From this stackoverflow question

This can be very handy when handling problems relating to the structure of the numbers, eg: Palindromic Numbers

Example code:

code

// i - position, l - leftmostlower, h - leftmosthigher, ze - numbers of leading zeros
ll dp(int i, int l, int h, int ze) {
    // imagine it as n - i - 1, and plus the offset of leading zeros
    // i is already offset by leading zeros
    int j = n - i - 1 + ze;
    
    if(i > j) return l <= h;
    if(vis[i][l][h][ze] == cur) return mem[i][l][h][ze];
    vis[i][l][h][ze] = cur;
    
    ll res = 0;
    for(int d = 0; d <= 9; d++) {
        int nl = l;
        int nh = h;
        
        if(d < v[i] && i < nl) nl = i;
        if(d < v[j] && j < nl) nl = j;
        if(d > v[i] && i < nh) nh = i;
        if(d > v[j] && j < nh) nh = j;
        
        res += dp(i + 1, nl, nh, ze + (i == ze && d == 0));
    }
    
    return mem[i][l][h][ze] = res;
}

Feel free to share any tricks or anything that people should know when doing digit dp! If there is any mistakes or suggestions, please let me know.

Comments (11)

Write comment?

gnudgnaoh

5 years ago, hide # |

Auto comment: topic has been updated by gnudgnaoh (previous revision, new revision, compare).

→ Reply

jalsol

+20

thank you very much sir bubu, impressive blog

anyway, the "memset improvement using time" is a good technique on its own, and it can also be applied for some other techniques (one of the common is Kuhn's algorithm for maximum matching)

5 years ago, hide # ^ |

thanks for taking your time reading this blog, hope it helped you and as always jalsol orz

SPyofgame

Also, there is kind of DP-Digit that only save partial part when the number of counting is dense and brute-forces when the counting is sparse. Some COCI problems must use this technique to AC

Also when DP-Digit on certain range with some property, you might find it possible to cut into a DP over a DP-Digit that might improve the calculation by reducing the re-calculating parts.

Can you provide the statement(or even better, links) for the COCI problems? And can you provide some examples about the DP over a digit DP part? Thanks!

Knightshade

In "LIDS" you don't need to memset every time. U can solve it taking into account current index, current digit, current lids only. [0.1s]

I just take that as an example for a slow solution that wouldn't AC.

stack overflow approach was out of the box for me. Liked it.

kyanhdang

9 months ago, hide # |

← Rev. 5 →

I think that it’s easier to understand why reversing the digit order helps avoid repeated memset() if you try solving this example problem with just the smaller = true state. You will notice a pattern:

mem[pos] = mem[pos-1] + temp (temp is the result of choosing values for the pos-th digit.)

This becomes clearer when we solve the same example problem but with a = 1 and b = 999...9 (i.e., calculating the sum of digits for all numbers from 1 to n^10-1). In this case, the DP structure forms a neat recurrence as values from numbers with small number of digits can naturally contribute to numbers with larger number of digits.

This lets us reuse previously computed DP values when processing numbers with more digits. Thus, we only need to call memset() once at the beginning, instead of clearing the DP array every time we run DP on a new number. This can be very useful when approaching DP digit problems that use multitest (input has many tests).

Also, I think that the trick of eliminating smaller dimension from the DP array by storing only DP values with smaller = true can be understood by looking at how the recursive function calls itself. The recursive function with smaller = false is only called at most n + 1 times, where n is the number of digits. This is because each call with smaller = false can only lead to one more such call, which maintains the smaller = false state on a single path from pos = n-1 to pos = -1.

For a better understanding, look at the call tree of dp(pos, smaller) when dp(n-1, false) is called:

As shown, the smaller = false calls only occur along a single path (at most once per digit position, a total of n+1 times). Therefore, we don’t need to store their results, we can just compute them directly and only memoize the states where smaller = true!! This trick can halves DP array's memory allocated, which prevents getting MLE when solving problems with tight memory limit.

This is my optimized code for the example problem

#include "bits/stdc++.h"
#define boostcode ios_base:: sync_with_stdio(0); cin.tie(0);

using namespace std;

typedef long long ll;

int t;
ll a, b;
int x[17];
ll dp[137][17];
// dp[s][i]: Sum of digits for all numbers from 1 to 999..9 (has i digits)

// How to optimize:
//
// - Optimize 1: Eliminate smaller dimension from dp[]
// (by only store values with state smaller=true and calculate directly
//  values with state smaller=false)
// ==> Halves DP array's memory allocated
//
// - Optimize 2: Store number digits in reverse order
// ==> Don't have to reset array dp[] after each DP on a number
// (Can memset() only once at the beginning of the code)

ll call(int s, int i, bool smaller) {
    if (i < 0) return s;
    if (smaller && dp[s][i]!=-1) return dp[s][i];
    int limit = (smaller ? 9 : x[i]);
    ll res = 0;
    for (int d = 0; d <= limit; d++) {
        res += call(s+d, i-1, smaller || d<limit);
    }
    if (smaller) return dp[s][i] = res;
    return res;
}
ll G(ll num) {
    int n = 0;
    x[n] = 0;
    while (num > 0) {
        x[n++] = num%10;
        num /= 10;
    }
    // Note: The number digits are stored in array x[] in reverse order
    // (Example: num = 47 then x[] = {7, 4})
    return call(0, n-1, 0);
}

int main()
{
    boostcode;

    memset(dp, -1, sizeof(dp));
    cin >> t;
    while (t--) {
        cin >> a >> b;
        cout << G(b) - G(a-1) << '\n';
    }

    return 0;
}

By the way, your blog helped me in optimizing the DP digit code, thank you so much! (my English is bad, sorry^)

gfgoodluck

9 months ago, hide # ^ |

Yeah exactly. I also found a problem MYQ10, where repeated memsets will cause TLE. I did know the omitting low trick could save memory, but I didn't realize it actually avoid repeated memsets in multiple testcases until I read an editorial.

#	User	Rating
1	Benq	3792
2	VivaciousAubergine	3647
3	Kevin114514	3603
4	jiangly	3583
5	turmax	3559
6	tourist	3541
7	strapple	3515
8	ksun48	3461
9	dXqwq	3436
10	Otomachi_Una	3413

#	User	Contrib.
1	Qingyu	157
2	adamant	153
3	Um_nik	147
3	Proof_by_QED	147
5	Dominater069	145
6	errorgorn	142
7	cry	139
8	YuukiS	135
9	TheScrasse	134
10	chromate00	133

gnudgnaoh's blog

Cut the flag dimension

Different ways to memset

"Normal" memset

Improvement using "time"

memset only once

Other optimizations problem-wise

Check sum of digits divisibility

For a single number

For many numbers

For numbers with special properties

Another way of digit dp