Tutorial: A simple O(n log n) polynomial multiplication algorithm

#	User	Rating
1	Benq	3792
2	VivaciousAubergine	3647
3	Kevin114514	3603
4	jiangly	3583
5	strapple	3515
6	tourist	3470
7	dXqwq	3436
8	Radewoosh	3415
9	Otomachi_Una	3413
10	Um_nik	3376

#	User	Contrib.
1	Qingyu	158
2	adamant	152
3	Um_nik	146
4	Dominater069	144
5	errorgorn	141
6	cry	139
7	Proof_by_QED	136
8	YuukiS	135
9	chromate00	134
9	TheScrasse	134

Hi Codeforces!

I have recently come up with a really neat and simple recursive algorithm for multiplying polynomials in $$$O(n \log n)$$$ time. It is so neat and simple that I think it might possibly revolutionize the way that fast polynomial multiplication is taught and coded. You don't need to know anything about FFT to understand and implement this algorithm.

I've split this blog up into two parts. The first part is intended for anyone to be able to read and understand. The second part is advanced and goes into a ton of interesting ideas and concepts related to this algorithm.

Prerequisite: Polynomial quotient and remainder, see Wiki article and Stackexchange example.

Task:

Given two polynomials $$$P$$$ and $$$Q$$$, an integer $$$n$$$ and a non-zero complex number $$$c$$$, where degree $$$P \lt n$$$ and degree $$$Q \lt n$$$. Your task is to calculate the polynomial $$$P(x) \, Q(x) \% (x^n - c)$$$ in $$$O(n \log n)$$$ time. You may assume that $$$n$$$ is a power of two.

Solution:

We can create a divide and conquer algorithm for $$$P(x) \, Q(x) \% (x^n - c)$$$ based on the difference of squares formula. Assuming $$$n$$$ is even, then $$$(x^n - c) = (x^{n/2} - \sqrt{c}) (x^{n/2} + \sqrt{c})$$$. The idea behind the algorithm is to calculate $$$P(x) \, Q(x) \% (x^{n/2} - \sqrt{c})$$$ and $$$P(x) \, Q(x) \% (x^{n/2} + \sqrt{c})$$$ using 2 recursive calls, and then use that result to calculate $$$P(x) \, Q(x) \% (x^n - c)$$$.

So how do we actually calculate $$$P(x) \, Q(x) \% (x^n - c)$$$ using $$$P(x) \, Q(x) \% (x^{n/2} - \sqrt{c})$$$ and $$$P(x) \, Q(x) \% (x^{n/2} + \sqrt{c})$$$?

Well, we can use the following formula:

$$$ \begin{aligned} A(x) \% (x^n - c) = &\frac{1}{2} (1 + \frac{x^{n/2}}{\sqrt{c}}) (A(x) \% (x^{n/2} - \sqrt{c})) \, + \\ &\frac{1}{2} (1 - \frac{x^{n/2}}{\sqrt{c}}) (A(x) \% (x^{n/2} + \sqrt{c})). \end{aligned} $$$

Proof of the formula

This formula is very useful. If we substitute $$$A(x)$$$ by $$$P(x) Q(x)$$$, then the formula tells us how to calculate $$$P(x) \, Q(x) \% (x^n - c)$$$ using $$$P(x) \, Q(x) \% (x^{n/2} - \sqrt{c})$$$ and $$$P(x) \, Q(x) \% (x^{n/2} + \sqrt{c})$$$ in linear time. With this we have the recipie for implementing a $$$O(n \log n)$$$ divide and conquer algorithm:

Input:

Integer $$$n$$$ (power of 2),
Non-zero complex number $$$c$$$,
Two polynomials $$$P(x) \% (x^n - c)$$$ and $$$Q(x) \% (x^n - c)$$$.

Output:

The polynomial $$$P(x) \, Q(x) \% (x^n - c)$$$.

Algorithm:

Step 1. (Base case) If $$$n = 1$$$, then return $$$P(0) \cdot Q(0)$$$. Otherwise:

Step 2. Starting from $$$P(x) \% (x^n - c)$$$ and $$$Q(x) \% (x^n - c)$$$, in $$$O(n)$$$ time calculate

$$$ \begin{align} P(x) \% (x^{n/2} - \sqrt{c}), \\ Q(x) \% (x^{n/2} - \sqrt{c}), \\ P(x) \% (x^{n/2} + \sqrt{c}), \\ Q(x) \% (x^{n/2} + \sqrt{c}). \end{align} $$$

Step 3. Make two recursive calls to calculate $$$P(x) \, Q(x) \% (x^{n/2} - \sqrt{c})$$$ and $$$P(x) \, Q(x) \% (x^{n/2} + \sqrt{c})$$$.

Step 4. Using the formula, calculate $$$P(x) \, Q(x) \% (x^n - c)$$$ in $$$O(n)$$$ time. Return the result.

Here is a Python implementation following this recipie:

Python solution to the task

"""
Calculates P(x) * Q(x) % (x^n - c) in O(n log n) time

Input:
  n: Integer, needs to be power of 2
  c: Non-zero complex floating point number
  P: A list of length n representing a polynomial P(x)
  Q: A list of length n representing a polynomial Q(x)
Output:
  A list of length n representing the polynomial P(x) * Q(x) % (x^n - c)
"""
def fast_polymult_mod(P, Q, n, c):
    assert len(P) == n and len(Q) == n
    
    # Base case
    if n == 1:
        return [P[0] * Q[0]]

    assert n % 2 == 0
    import cmath
    sqrtc = cmath.sqrt(c)

    # Calulate P_minus := P mod (x^(n/2) - sqrt(c))
    #          Q_minus := Q mod (x^(n/2) - sqrt(c))

    P_minus = [p1 + sqrtc * p2 for p1,p2 in zip(P[:n//2], P[n//2:])]
    Q_minus = [q1 + sqrtc * q2 for q1,q2 in zip(Q[:n//2], Q[n//2:])]

    # Calulate P_plus := P mod (x^(n/2) + sqrt(c))
    #          Q_plus := Q mod (x^(n/2) + sqrt(c))

    P_plus = [p1 - sqrtc * p2 for p1,p2 in zip(P[:n//2], P[n//2:])]
    Q_plus = [q1 - sqrtc * q2 for q1,q2 in zip(Q[:n//2], Q[n//2:])]

    # Recursively calculate PQ_minus := P * Q % (x^n/2 - sqrt(c)) 
    #                       PQ_plus  := P * Q % (x^n/2 + sqrt(c))
    
    PQ_minus = fast_polymult_mod(P_minus, Q_minus, n//2, sqrtc)
    PQ_plus  = fast_polymult_mod(P_plus,  Q_plus,  n//2, -sqrtc)

    # Calculate PQ mod (x^n - c) using PQ_minus and PQ_plus
    PQ = [(m + p)/2         for m,p in zip(PQ_minus, PQ_plus)] +
         [(m + p)/(2*sqrtc) for m,p in zip(PQ_minus, PQ_plus)]
    
    return PQ

One final thing that I want to mention before going into the advanced section is that this algorithm can also be used to do fast unmodded polynomial multiplication, i.e. given polynomials $$$P(x)$$$ and $$$Q(x)$$$ calculate $$$P(x) \, Q(x)$$$. The trick is simply to pick $$$n$$$ large enough such that $$$P(x) \, Q(x) = P(x) \, Q(x) \% (x^n - c)$$$, and then use the exact same algorithm as before. $$$c$$$ can be arbitrarily picked (any non-zero complex number works).

Python implementation for general Fast polynomial multiplication

"""
Calculates P(x) * Q(x)

Input:
  P: A list representing a polynomial P(x)
  Q: A list representing a polynomial Q(x)
Output:
  A list representing the polynomial P(x) * Q(x) % (x^n - c)
"""
def fast_polymult(P, Q):
    # Calculate length of the list representing P*Q
    n1 = len(P)
    n2 = len(Q)
    res_len = n1 + n2 - 1
    
    # Pick n sufficiently big
    n = 1
    while n < res_len:
        n *= 2

    # Pad with extra 0s to reach length n
    P = P + [0] * (n - n1)
    Q = Q + [0] * (n - n2)
    
    # Pick non-zero c arbitrarily =)
    c = 123.24

    # Calculate P*Q mod x^n - c
    PQ = fast_polymult_mod(P, Q, n, c)

    # Remove extra 0 padding and return
    return PQ[:res_len]

(Advanced) Speeding up the algorithm

This section will be about tricks that can be used to speed up the algorithm. This will in total speed it up by a factor of between 2 and 4.

$n$ doesn't actually need to be a power of 2

We don't actually need the assumption that $$$n$$$ is a power of 2. If $$$n$$$ ever becomes odd during the recrsion, then we have two choices: Either fall back to a $$$O(n^2)$$$ algorithm or fall back to the unmodded $$$O(n \log{n})$$$ Polynomial multiplication algorithm.

Let us discuss the run time of falling back to the $$$O(n^2)$$$ algorithm when $$$n$$$ becomes odd. Assume that $$$n = a \cdot 2^b$$$, where $$$a$$$ is an odd integer and $$$b$$$ is an integer. Think of the recursive algorithm as having layers, one layer for each possible value of $$$n$$$. The first $$$b$$$ layers will all take $$$O(n)$$$ time each. In the $$$(b+1)$$$-th layer the value of $$$n$$$ is $$$a$$$. Using the $$$O(n^2)$$$ polynomial multiplication algorithm leads to this layer taking $$$O(n/a \cdot a^2) = O(n \cdot a)$$$ time. The final time complexity comes out to be $$$O((a + b) \, n)$$$.

Python implementation that works for both odd and even $n$

"""
Calculates P(x) * Q(x) % (x^n - c) in O((a + b) * n) time, where n = a*2^b.

Input:
  n: Integer
  c: Non-zero complex floating point number
  P: A list of length n representing a polynomial P(x)
  Q: A list of length n representing a polynomial Q(x)
Output:
  A list of length n representing the polynomial P(x) * Q(x) % (x^n - c)
"""
def fast_polymult_mod2(P, Q, n, c):
    assert len(P) == n and len(Q) == n
    
    # Base case (n is odd)
    if n & 1:
        # Calculate the answer in O(n^2) time
        res1 = [0] * n
        res2 = [0] * n
        for i in range(n):
            for j in range(n - i):
                res1[i + j] += P[i] * Q[j]
            for j in range(n - i, n):
                res2[i + j - n] += P[i] * Q[j]
        return [r1 + c * r2 for r1,r2 in zip(res1, res2)]

    assert n % 2 == 0
    import cmath
    sqrtc = cmath.sqrt(c)

    # Calulate P_minus := P mod (x^(n/2) - sqrt(c))
    #          Q_minus := Q mod (x^(n/2) - sqrt(c))

    P_minus = [p1 + sqrtc * p2 for p1,p2 in zip(P[:n//2], P[n//2:])]
    Q_minus = [q1 + sqrtc * q2 for q1,q2 in zip(Q[:n//2], Q[n//2:])]

    # Calulate P_plus := P mod (x^(n/2) + sqrt(c))
    #          Q_plus := Q mod (x^(n/2) + sqrt(c))

    P_plus = [p1 - sqrtc * p2 for p1,p2 in zip(P[:n//2], P[n//2:])]
    Q_plus = [q1 - sqrtc * q2 for q1,q2 in zip(Q[:n//2], Q[n//2:])]

    # Recursively calculate PQ_minus := P * Q % (x^n/2 - sqrt(c)) 
    #                       PQ_plus  := P * Q % (x^n/2 + sqrt(c))
    
    PQ_minus = fast_polymult_mod(P_minus, Q_minus, n//2, sqrtc)
    PQ_plus  = fast_polymult_mod(P_plus,  Q_plus,  n//2, -sqrtc)

    # Calculate PQ mod (x^n - c) using PQ_minus and PQ_plus
    PQ = [(m + p)/2         for m,p in zip(PQ_minus, PQ_plus)] +
         [(m + p)/(2*sqrtc) for m,p in zip(PQ_minus, PQ_plus)]
    
    return PQ

The reason why this is super useful is that it allows us to speed up the fast unmodded polynomial multiplication algorithm. As long as we are fine with $$$a$$$ being less than say $$$10$$$, then we might be able to choose a significantly smaller $$$n$$$ compared to what would be possible if we are only allowed to choose powers of two. This trick has the potential of making the fast unmodded polynomial multiplication algorithm run twice as fast.

Python implementation for more efficient fast unmodded polynomial multiplication

"""
Calculates P(x) * Q(x)

Input:
  P: A list representing a polynomial P(x)
  Q: A list representing a polynomial Q(x)
Output:
  A list representing the polynomial P(x) * Q(x) % (x^n - c)
"""
def fast_polymult2(P, Q):
    # Calculate length of the list representing P*Q
    n1 = len(P)
    n2 = len(Q)
    res_len = n1 + n2 - 1
    
    # Pick n sufficiently big
    b = 0
    alim = 10
    while alim * 2**b < res_len:
        b += 1
    a = (res_len - 1) // 2**b + 1
    n = a * 2**b

    # Pad with extra 0s to reach length n
    P = P + [0] * (n - n1)
    Q = Q + [0] * (n - n2)
    
    # Pick non-zero c arbitrarily =)
    c = 123.24

    # Calculate P*Q mod x^n - c
    PQ = fast_polymult_mod2(P, Q, n, c)

    # Remove extra 0 padding and return
    return PQ[:res_len]

Imaginary-cyclic convolution

Trick to go from $\% (x^n - c)$ to $\% (x^n - 1)$

(Advanced) -is-this-fft-?

This algorithm is actually FFT in disguise. But it is also different compared to any other FFT algorithm that I've seen before (for example the Cooley–Tukey FFT algorithm).

Using this algorithm to calculate FFT

This algorithm is not the same algorithm as Cooley–Tukey

(Advanced) Connection between this algorithm and NTT

Just like how there is FFT and NTT, there are two variants of this algorithm too. One using complex floating point numbers, and the other using modulo a prime (or more generally modulo an odd composite number).

Using modulo integers instead of complex numbers

What if $sqrt(c)$ doesn't exist?

Rev.	By	When	Δ	Comment
en53	pajenegod	2023-07-21 20:29:32	631
en52	pajenegod	2023-07-21 19:29:42	0	(published)
en51	pajenegod	2023-07-21 19:29:23	120	(saved to drafts)
en50	pajenegod	2023-07-10 22:53:22	1131
en49	pajenegod	2023-07-10 20:27:39	204
en48	pajenegod	2023-07-10 20:18:13	0	(published)
en47	pajenegod	2023-07-10 20:14:48	435
en46	pajenegod	2023-07-10 19:55:32	288
en45	pajenegod	2023-07-10 19:51:08	13
en44	pajenegod	2023-07-10 19:48:33	7
en43	pajenegod	2023-07-10 19:41:29	1385
en42	pajenegod	2023-07-10 03:42:37	2274
en41	pajenegod	2023-07-09 22:40:43	571
en40	pajenegod	2023-07-09 22:00:53	39
en39	pajenegod	2023-07-09 21:57:38	1100
en38	pajenegod	2023-07-09 20:46:05	384
en37	pajenegod	2023-07-09 20:03:42	39
en36	pajenegod	2023-07-09 19:36:59	3745
en35	pajenegod	2023-07-09 17:00:26	121
en34	pajenegod	2023-07-09 16:53:59	2297
en33	pajenegod	2023-07-08 22:04:53	23
en32	pajenegod	2023-07-08 15:24:23	3
en31	pajenegod	2023-07-08 15:22:29	24
en30	pajenegod	2023-07-08 15:19:05	192
en29	pajenegod	2023-07-08 15:03:43	438
en28	pajenegod	2023-07-08 14:50:40	16
en27	pajenegod	2023-07-08 14:48:51	278
en26	pajenegod	2023-07-08 14:41:17	5
en25	pajenegod	2023-07-08 14:40:38	7540
en24	pajenegod	2023-07-08 13:38:20	1
en23	pajenegod	2023-07-08 13:27:45	24
en22	pajenegod	2023-07-08 13:26:39	844
en21	pajenegod	2023-07-08 13:08:19	1427
en20	pajenegod	2023-07-08 12:36:17	492
en19	pajenegod	2023-07-08 04:13:26	5
en18	pajenegod	2023-07-08 04:10:18	30
en17	pajenegod	2023-07-08 03:59:49	999
en16	pajenegod	2023-07-08 03:42:36	977
en15	pajenegod	2023-07-08 03:27:23	4195
en14	pajenegod	2023-07-07 20:07:54	1204
en13	pajenegod	2023-07-07 18:13:08	2
en12	pajenegod	2023-07-07 18:08:18	39
en11	pajenegod	2023-07-07 13:51:58	89
en10	pajenegod	2023-07-07 13:41:19	789
en9	pajenegod	2023-07-07 06:29:01	11
en8	pajenegod	2023-07-07 02:22:13	4
en7	pajenegod	2023-07-07 02:02:33	122	ffao,-is-this-fft-,meooow,ToxicPie9,algmyr,aryanc403,kostia244,nor,Spheniscine,magnus.hegdahl,jeroenodb
en6	pajenegod	2023-07-07 01:52:20	119
en5	pajenegod	2023-07-07 01:46:09	448
en4	pajenegod	2023-07-07 01:35:15	2157
en3	pajenegod	2023-07-07 00:53:13	1957
en2	pajenegod	2023-07-07 00:15:55	2037
en1	pajenegod	2023-07-06 23:46:15	3121	Initial revision (saved to drafts)

Task:

Solution:

(Advanced) Speeding up the algorithm

(Advanced) -is-this-fft-?

(Advanced) Connection between this algorithm and NTT

History