Think FFT doesn't matter? You're probably right I found an unintended extra speedup to a div 2 E that I thought was funny.
I was practicing and came across 2025E while practicing, a 2200 problem from an educational div2.
If you want, try the problem yourself. Then suppose $$$m = 1000, n = 1000$$$, and see if you can solve it in faster than cubic time.
This is my cubic solution (read this even if you've solved it yourself/seen the editorial since I thought of a slightly different way to do it)
Consider a suit that's not the trump suit (not suit 1). How many ways can we distribute $$$x$$$ cards to $$$A$$$ (and thus $$$m-x$$$ cards to $$$B$$$)?
- If we receive more cards than them ($$$x \gt m-x$$$), then we're screwed, because after we beat all of their cards we will have some non-trump-suit cards left over which can't beat anything, so we don't win.
- If we receive less or equal cards than them, assume we can use all of our cards to beat some of theirs, and then we'll have to use trump cards to beat the rest.
- We have to be able to use our $$$x$$$ cards to beat $$$x$$$ of their cards, otherwise we result in the situation in the first case where we have leftovers that can't beat anything.
So, in order to calculate this we can run DP:
- $$$dp[k][i]$$$ = how many ways to give ourselves $$$i$$$ cards, out of the first $$$k$$$ cards.
- Step 1: Start with card 1. We have to give it to $$$B$$$. So $$$dp[1][0] = 1$$$.
- Step $$$k$$$: Then for a card with value $$$k$$$:
- We could give it to ourselves, but not if it "overflows". So add transition from $$$dp[k-1][i]$$$ where $$$i \lt k/2$$$, $$$dp[k][i+1] += dp[k-1][i]$$$. If $$$i \gt = k/2$$$, then giving the card to ourselves results in it being left over.
- Or we give it to our opponents. So transition $$$dp[k][i] += dp[k-1][i]$$$.
- By processing the cards from smallest to greatest, the current card is guaranteed to beat all previous cards.
This DP takes $$$O(m^2)$$$ since $$$k$$$ and $$$i$$$ both go up to $$$m$$$.
Okay, now we want another DP to know how many ways to give us cards out of suits $$$2...n$$$.
- $$$dp2[k][i]$$$ = how many ways to give us $$$i$$$ cards, out of the suits $$$2...k$$$.
- We start with suit $$$k=2$$$ and base case $$$dp2[1][0] = 1$$$.
- For each $$$dp2[k-1][i]$$$, we can give ourselves $$$j$$$ cards (to then have $$$i+j$$$ cards total), and there are $$$dp[m][j]$$$ ways to do that. So we do $$$dp2[k][i+j] += dp[m][j] \cdot dp2[k-1][i]$$$ (and reduce modulo if needed). (We have to do this for each value of $$$j$$$ from $$$1$$$ to $$$m/2$$$.)
- Repeat for each future suit.
There are $$$n^2 m$$$ states in $$$dp2$$$ (note that $$$i$$$ goes up to $$$n m$$$), and the transition takes $$$m$$$ time, for $$$n^2 m^2$$$ total. That's going to TLE.
Let's remind ourselves what we care about. If we get $$$i$$$ cards from the non-trump suit, then we need to have $$$\frac{nm}{2} - i$$$ more trump cards than our opponent. And since we can have no more than $$$m$$$ more trump cards than our opponent, we really only care about having between $$$\frac{nm}{2} - m$$$ and $$$\frac{nm}{2}$$$ cards $$$i$$$.
So at step $$$k$$$ in $$$dp2$$$, we only really care about the values of $$$i$$$ in the range $$$\frac{m}{2}k-m$$$ to $$$\frac{m}{2}k$$$. That's because you can only add $$$\frac{m}{2}$$$ elements per step, so if you're below $$$\frac{m}{2}k - m$$$, you are irrelevant to the final answer. So now we really only have $$$m$$$ states, and we run in $$$O(n m^2)$$$ and don't TLE anymore. Yay!
Extra note: To extract the answer, you sum together all the results of multiplying $$$dp2[n][i]$$$ by the number of ways to have $$$\frac{nm}{2} - i$$$ more trump cards than our opponent. To calculate this, it's very similar to DP before, but this time start from the highest trump card (which has to be ours), then work your way down. At every point, you should have equal or more trump cards than your opponent.
Now for the fun part.
Let's take a look at the DP transition from step $$$k-1$$$ to $$$k$$$, rewritten a tiny bit: $$$dp2[k][i] += dp[m][j] * dp2[k-1][i-j]$$$:
- Each $$$dp2[k][j]$$$ is equal to (I'm going to omit the $$$m$$$ and $$$k-1$$$ now) $$$dp[j] \cdot dp2[i-j]$$$ summed over all $$$j$$$
- This is a convolution! Specifically you're getting the $$$i$$$'th element of the convolution between $$$dp$$$ and $$$dp2$$$. Zero extend both arrays as needed.
- So, run FFT to calculate the convolutions. This lets you run each DP update step in $$$m \log m$$$ time (the relevant portions of $$$dp$$$ and $$$dp2$$$ are $$$m$$$ long), and over $$$n$$$ steps that's $$$m n \log m$$$.
Didn't write an implementation because I'm lazy. Unfortunately that also means my logic might be flawed somewhere. I hope it isn't.









Auto comment: topic has been updated by greateric (previous revision, new revision, compare).
you can optimise the convolution portion further: notice that you are convolving dp2 with itself n times (dp is initially [1, 0, 0, ..., 0, 0] which is the identity element for convolutions). Write that as (dp2)^N. The convolution operation is commutative, so you can compute (dp2)^N using binary exponentiation. for n <= 500 this means you can compute the final answer in just 9(!) FFTs. (also because dp2 is just catalan numbers you can solve the whole problem in O(m + 9mlogm) = O(mlogm))
lyjiang