DP Linear Recurrence — When Matrix Exponentiation doesn't cut it

#	User	Rating
1	tourist	3985
2	jiangly	3814
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3517
7	Radewoosh	3410
8	hos.lyric	3399
9	ecnerwala	3392
9	Um_nik	3392

#	User	Contrib.
1	cry	169
2	maomao90	162
2	Um_nik	162
4	atcoder_official	161
5	djm03178	158
6	-is-this-fft-	157
7	adamant	155
8	Dominater069	154
8	awoo	154
10	luogu_official	150

Here's a nice optimization for linear recurrence problems. This mostly works when the you're trying to find the nth term of a sequence of the form $$$f(n) = a_1 \cdot f(n-1) + a_2 \cdot f(n-2) + \dots + a_m \cdot f(n-m)$$$, usually modulo some big number, where n is too large to use the normal dynamic programming solution, usually $$$ n \approx 10^{18}$$$. If

Unable to parse markup [type=CF_MATHJAX]

, the normal solution involves matrix multiplication in $$$O(m^3 log(n))$$$. But under some conditions we can do better using dynamic programming and divide and conquer.

The idea involves reducing the problem into finding how many arrays with numbers 1 to m and sum n can be constructed. To do this, we will divide the problem of finding $$$f(n)$$$ into finding $$$f(k)$$$, where k is in the neighborhood of $$$\frac n 2$$$.

All problems shown here can be solved using matrix exponentiation, but due to it's cubic nature, if m was larger it would not be possible.

Magic Gems — 1117D

This was the first problem that I solved using this approach, so I will be using to prove it: You have gems that are either worth 1 or M and your objective is to find how many arrays modulo $$$10^9+7$$$ have worth N. As an example, if N = 10 and M = 3, you would try to find in how many ways you could either write 1 or 3 and add up to N. 11111111, 11111113, 131131, 3331, 1333 are 5 of the possible examples.

Let's look at the largest prefix that doesn't exceed $$$\frac n 2$$$. In the example, this subarray can either have 3, 4 or 5 elements. Moreover, if it has 3 or 4 elements, the next one will be a 3, and then will come a suffix with 4 or 3 elements respectively, but if we have 5, the suffix can be any array with sum 5. With this we can derive:

$$$f(10)=f(5)\cdot f(5) + f(4) \cdot f(3) + f(3) \cdot f(4)$$$

More generally, if $$$k = \lfloor {\frac n 2} \rfloor$$$

$$$f(n) = f(k)\cdot f(n-k) + f(k-1) \cdot f(n-k-2) + f(k-2) \cdot f(n-k-1)$$$

And if m wasn't 3, we get

$$$f(n) = f(k) \cdot f(n-k) + \sum _{i=1}^{M-1} f(k+1) \cdot f(n-k-m+1)$$$

It can be proven that the ith iteration of dividing by 2, the range of values for to calculate are never greater than $$$\frac n {2^i} +2$$$ and are all at least $$$\frac n {2^i} - 2\cdot m$$$, meaning that in each one of the $$$log n$$$ iterations, we will need to calculate $$$O(m)$$$ values. Given that the previous terms were already computed, we can compute that next term in $$$O(m)$$$. Meaning the total complexity is $$$O(m^2 \cdot log(n))$$$ by using a hashmap to save the dp, or $$$O(m^2 \cdot (log (n))^2)$$$ by using a treemap. Due to the numbers coming in batches, I will assume that hacking this resulting in multiple hash collisions and TLE is not possible. If you don't trust the analysis, add an extra $$$O(log n)$$$ term to all results from now on.

Matrix solution — AC 452ms

DP solution — AC 78ms

Fibonacci Numbers — CSES 1722

After seeing the last solution, you may have realized that if M=2, the terms were all fibonacci numbers. This is because they follow the same recurrence $$$f(n) = f(n-1) + f(n-2)$$$.

You may be tempted to just return hardcode M=2, and send it as a solution, but this will give WA due to $$$F_0 = 0$$$, while our solution will assume it equals 1. Changing the values on the dp map so that $$$f(0) = 0, f(1) = 1$$$, also won't solve our problem. Our solution assumes that this is a construction and so by definition, we can never construct an array with sum less than 0, and we can only construct an array with sum 0 in one way. This forces us to leave $$$f(0) = 1, f(1) = 1$$$ and then call $$$F_n = f(n-1)$$$

As both solutions are equally as fast, I will only be linking the dp solution here

Throwing Dice — CSES 1096

This problem can be easily translated as adding up to N using only numbers 1, 2, 3, 4, 5 or 6. Let's use the same trick as before; we will grab the longest prefix that doesn't add up to more than half of N.

Let's supposed that N = 100. The halfway mark is 50, so the last prefix could add up to 45, 46, 47, 48, 49 or 50. In the case that it adds up to 45, I know the next dice throw was a 6, and then comes a suffix of 49. If it added up to 46, the next dice throw was either a 5 or a 6, and then would either come a suffix of 48 or 49. The next one over, if it added up to 47, the next throw would either be 4, 5, or 6, followed by a suffix of 47, 48 or 49. So on and so forth. We derive the next equation:

$$$f(n) = f(k-5)\cdot f(n-k-1) + f(k-4) \cdot (f(n-k-2) + f(n-k-1))+\dots$$$

$$$f(n) = \sum {i=1} ^{6} (f(k-6+i) \cdot (\sum{j=1}^i f(n-k-j)))$$$

If the dice had M faces, then the general equation would be:

$$$f(n) = \sum {i=1} ^{M} (f(k-M+i) \cdot (\sum{j=1}^i f(n-k-j)))$$$

Note that we can compute $$$\sum_{j=1}^i f(n-k-j)$$$ on each iteration fast. If we didn't, then computing $$$f(n)$$$ would take $$$O(m^2)$$$, which would make the whole algorithm $$$O(m^3 \cdot log(n))$$$

Despite there not being a significant time difference, I leave both solutions here with the dice face count(M) being a variable in case someone wants to compare speed.

Matrix

Very imprecise timing

Generalizations

Changing the Coefficients

Let's try to generalize the problem. The linear recurrence is $$$f(n) = a_1 \cdot f(n-1) + a_2 \cdot f(n-2) + \dots + a_m \cdot f(n-m)$$$, can we find some solution in terms of values in the neighborhood of f(n/2)?

The answer is yes, and if you followed the blog, finding it is not that difficult. Say we have $$$a_1$$$ blocks of length 1, $$$a_2$$$ of length 2, and so on, we want to find in how many ways can these blocks form an array of length $$$n$$$. For this, we check when the array fails to pass the half way mark for the last time. This is at least on $$$k-m+1$$$, where again $$$k = \lfloor {\frac n 2} \rfloor$$$, and then, we must use any block of length $$$m$$$, to get a suffix of length $$$n-k-1$$$. The next place where this could happen is one block over, on $$$k-m+2$$$, where you can choose either to place any block of length $$$m$$$ or $$$m-1$$$, leaving a suffix of length $$$n-k-2$$$ or $$$n-k-1$$$ respectively. Thus we get:

$$$f(n) = \sum_{i=1}^m f(k-m+i)\cdot(\sum_{j=1}^i a_{i-j+1} f(n-k-j))$$$

Note that unless we can easily calculate the inner sum(ie calculate all $$$m$$$ terms in $$$O(m)$$$), the algorithm will have a running time of $$$O(m^3\log{n})$$$, with a constant about 5 times worse than matrix exponentiation. In the cases described where most of the $$$a_j$$$ were zeros, or equal among each other, this can be done, but the general case is a bit harder.

As an exercise for the reader, try to find the solution when $$$a_i=i$$$ for all i, or alternatively, in how many ways can we make N as a sum of dice faces, where we have an unlimited amount of dice with 1 face, with 2 faces, with 3 faces, up to M faces.

Explaining negative numbers

The construction makes it very clear that it works for all non-negative values of a_i, but does it work as well with negatives? Yes. Here's a proof of it:

There are some blocks, different lengths, different colours. We want to arrange them into a sum of length n. Some blocks are "anti-blocks", which turn possible sums of n into "anti-sums", while turning said anti-sums into regular sums. We'll denote A(n) the number of ways to make regular sums, while B(n) anti-sums. If there are $$$a_i$$$ blocks of length $$$i$$$, as well as $$$b_i$$$ anti-blocks. Let's define $$$f(n) = A(n) - B(n)$$$. It can be shown that:

$$$f(n) = (a_1-b_1)\cdot f(n-1)+(a_2-b_2)\cdot f(n-2)+\dots (a_m-b_m) \cdot f(n-m)$$$

Let's say to calculate some A(n) and B(n) we need to compute for the prefix of length x and the suffix of length y. The ways in which you can make a regular sum are either if both prefix and suffix are regular, or if both are anti-sums. The ways in which you can make an anti-sum are if prefix and suffix are not both regular or anti-sums.

$$$A = A(x)A(y) + B(x)B(y)$$$

$$$B = A(x)B(y) + A(y)B(x)$$$

$$$A - B = A(x)A(y) + B(x)B(y) - A(x)B(y) - A(y)B(x)$$$

$$$A - B = A(x)(A(y)-B(y)) + B(x)(B(y) - A(y))$$$

$$$A - B = (A(x)-B(x))\cdot (A(y)-B(y))$$$

$$$A - B = f(x) \cdot f(y)$$$

So instead of keeping track of both, you can just use $$$f(n)$$$ for the entire sum.

Different base cases

Our second problem when trying to generalize can be seen in the fibonacci problem. If instead of defining $$$f(0)=f(1)=1$$$, you tried to define $$$f(0)=0$$$ as the problem suggests, the solution will not be correct. As our solution consists building an array of blocks, we have somewhere implied that the empty solution($$$N=0$$$) is unique, and all negative solutions($$$N<0$$$) are non-existent(ie. f(0) = 1, f(x<0) = 0). The solution given was to shift everything by 1 once the dp is over. We can also multiply terms of a sequence by a constant or add two sequences with the same recurrence and the solution will stay the same. As an exercise, try to find the nth Lucas number using this, which has the same linear recurrence than fibonacci, but starts with 2, 1, 3, 4, 7, 11, 18.

Solution

Adding a constant

Finally, we can try to compute the same recurrences with $$$+b$$$ at the end:

$$$f_b(n) = a_1 \cdot f(n-1) + a_2 \cdot f(n-2) + \dots + a_m \cdot f(n-m) + b$$$

A general algorithm for these also exists, but I will not be showing the exact details of it due to there being no problem that uses them. Here's the general idea.

If b=0 this is the same problem as before.
If b=1 we assume the starting case to be f(0)=1 and it can be shown that $$$f_1(n) = \sum _{i=0}^n f(i)$$$
For all other b, taking $$$f_b(0) = b, f_b(n) = b*f_1(n)$$$
For all other starting cases, adding multiples of $$$f_0(n)$$$ as well as shifting that sequence does not affect the recurrence but does change the base cases.

I started writing this blog four months ago and don't remember the exact formulas or proofs, but in case anyone wants to think about it: It might help to think of a "magic brick" that can be stretched to fill however many squares but must be placed before all the regular bricks. Note that it is now possible that the first time the halfway marked is passed is well over the half way mark, the b=1 equation might be helpful in that scenario. And while computing $$$f_1(n)$$$ it might help to also compute $$$f_0(n)$$$.

Berlekamp-Massey

I learnt while writing this that this class of problems can be solved with Berlekamp-Massey and schoolbook multiplication in $$$O(m^2 log(n))$$$ and even faster with FFT. This method is definitively not faster than FFT and is also not a general solution(although most problems where the recurrences are constant will work), but it is faster than matrix multiplication with a comparable number of lines.

I see two situations in which you might prefer this over Berlekamp-Massey:

Not having FFT Poly mod Poly implemented, particularly in contests like ICPC, where implementation time matters. The idea of a Divide and Conquer DP looks more intuitive and easier to remember.
A query problem, where, as the answer and many problems are already memoized, it should have a lower run time.

As the more complex a problem gets the easier it is to just go with Berlekamp-Massey, I decided to ignore the details in the generalizations.

I also generally considered it a nice algorithm to know, even if not very useful.