Hi Codeforces!
I have recently come up with a really neat and simple recursive algorithm for multiplying polynomials in O(nlogn) time. It is so neat and simple that I think it might possibly revolutionize the way that fast polynomial multiplication is taught and coded. You don't need to know anything about FFT to understand and implement this algorithm.
I've split this blog up into two parts. The first part is intended for anyone to be able to read and understand. The second part is advanced and goes into a ton of interesting ideas and concepts related to this algorithm.
Prerequisite: Polynomial quotient and remainder, see Wiki article and Stackexchange example.
Task:
Given two polynomials P and Q and an integer n, where degree P<n and degree Q<n. Your task is to calculate the polynomial P(x)Q(x)%(xn−c) in O(nlogn) time. You may assume that n is a power of two.
Solution:
We can create a divide and conquer algorithm for P(x)Q(x)%(xn−c) based on the difference of squares formula. Assuming n is even, then (xn−c)=(xn/2−√c)(xn/2+√c). The idea behind the algorithm is to calculate P(x)Q(x)%(xn/2−√c) and P(x)Q(x)%(xn/2+√c) using 2 recursive calls, and then use the result of that to calculate P(x)Q(x)%(xn−c).
So how do we actually calculate P(x)Q(x)%(xn−c) using P(x)Q(x)%(xn/2−√c) and P(x)Q(x)%(xn/2+√c)?
Well, we can use the following formula:
A(x)%(xn−c)=12(1+xn/2√c)(A(x)%(xn/2−√c))+12(1−xn/2√c)(A(x)%(xn/2+√c)). Proof of the formulaNote that \begin{equation} A(x) = \frac{1}{2} (1 + \frac{x^{n/2}}{\sqrt{c}}) A(x) + \frac{1}{2} (1 — \frac{x^{n/2}}{\sqrt{c}}) A(x). \end{equation}
Let Q^-(x) denote the quotient of A(x) divided by (x^n/2 - \sqrt{c}) and let Q^+(x) denote the quotient of A(x) divided by (x^n/2 + \sqrt{c}). Then
\begin{aligned} (1 + \frac{x^{n/2}}{\sqrt{c}}) A(x) &= (1 + \frac{x^{n/2}}{\sqrt{c}}) ((A(x) \% (x^{n/2} - \sqrt{c})) + Q^-(x) (x^{n/2} - \sqrt{c})) \\ &= (1 + \frac{x^{n/2}}{\sqrt{c}}) (A(x) \% (x^{n/2} - \sqrt{c})) + \frac{1}{\sqrt{c}} Q^-(x) (x^n - c)) \end{aligned} and
\begin{aligned} (1 - \frac{x^{n/2}}{\sqrt{c}}) A(x) &= (1 - \frac{x^{n/2}}{\sqrt{c}}) ((A(x) \% (x^{n/2} + \sqrt{c})) + Q^+(x) (x^{n/2} + \sqrt{c})) \\ &= (1 - \frac{x^{n/2}}{\sqrt{c}}) (A(x) \% (x^{n/2} + \sqrt{c})) - \frac{1}{\sqrt{c}} Q^+(x) (x^n - c)). \end{aligned} With this we have shown that
\begin{aligned} A(x) = &\frac{1}{2} (1 + \frac{x^{n/2}}{\sqrt{c}}) (A(x) \% (x^{n/2} - \sqrt{c})) \, + \\ &\frac{1}{2} (1 - \frac{x^{n/2}}{\sqrt{c}}) (A(x) \% (x^{n/2} + \sqrt{c})) \, + \\ &\frac{1}{\sqrt{c}} \frac{Q^-(x) - Q^+(x)}{2} (x^n - c). \end{aligned} Here A(x) is expressed as remainder + quotient times (x^n - c). So we have proven the formula.
This formula is very useful. If we substitute A(x) by P(x) Q(x), then the formula tells us how to calculate P(x) \, Q(x) \% (x^n - c) using P(x) \, Q(x) \% (x^{n/2} - \sqrt{c}) and P(x) \, Q(x) \% (x^{n/2} + \sqrt{c}). Additionally, if we substitute A(x) with P(x), then the formula tells us how to calculate P(x) \% (x^{n/2} - \sqrt{c}) and P(x) \% (x^{n/2} + \sqrt{c}) using P(x) \% (x^n - c). With this we have the entire recipie for implementing a O(n \log n) divide and conquer algorithm.
Python solution to the task"""
Calculates P(x) * Q(x) % (x^n - c) in O(n log n) time
Input:
n: integer, needs to be power of 2
c: Non-zero complex floating point number
P: A list of length n representing a polynomial P(x)
Q: A list of length n representing a polynomial Q(x)
Output:
A list of length n representing the polynomial P(x) * Q(x) % (x^n - c)
"""
def fast_polymult_mod(P, Q, n, c):
assert len(P) == n and len(Q) == n
# Base case
if n == 1:
return [P[0] * Q[0]]
assert n % 2 == 0
import cmath
sqrtc = cmath.sqrt(c)
# Calulate P_minus := P mod (x^(n/2) - sqrt(c))
# Q_minus := Q mod (x^(n/2) - sqrt(c))
P_minus = [p1 + sqrtc * p2 for p1,p2 in zip(P[:n//2], P[n//2:])]
Q_minus = [q1 + sqrtc * q2 for q1,q2 in zip(Q[:n//2], Q[n//2:])]
# Calulate P_plus := P mod (x^(n/2) + sqrt(c))
# Q_plus := Q mod (x^(n/2) + sqrt(c))
P_plus = [p1 - sqrtc * p2 for p1,p2 in zip(P[:n//2], P[n//2:])]
Q_plus = [q1 - sqrtc * q2 for q1,q2 in zip(Q[:n//2], Q[n//2:])]
# Recursively calculate PQ_minus := P * Q % (x^n/2 - sqrt(c))
# PQ_plus := P * Q % (x^n/2 + sqrt(c))
PQ_minus = fast_polymult_mod(P_minus, Q_minus, n//2, sqrtc)
PQ_plus = fast_polymult_mod(P_plus, Q_plus, n//2, -sqrtc)
# Calculate PQ mod (x^n - c) using PQ_minus and PQ_plus
PQ = [(m + p)/2 for m,p in zip(PQ_minus, PQ_plus)] +
[(m + p)/(2*sqrtc) for m,p in zip(PQ_minus, PQ_plus)]
return PQ
One final thing that I want to mention before going into the advanced section is that this algorithm can also be used to do fast unmodded polynomial multiplication, i.e. given polynomials P(x) and Q(x) calculate P(x) \, Q(x). The trick is simply to pick n large enough such that P(x) \, Q(x) = P(x) \, Q(x) \% (x^n - c), and then use the exact same algorithm as before. c can be arbitrarily picked (any non-zero complex number works).
Python implementation for general Fast polynomial multiplication"""
Calculates P(x) * Q(x)
Input:
P: A list representing a polynomial P(x)
Q: A list representing a polynomial Q(x)
Output:
A list representing the polynomial P(x) * Q(x) % (x^n - c)
"""
def fast_polymult(P, Q)
# Calculate length of the list representing P*Q
n1 = len(P)
n2 = len(Q)
res_len = n1 + n2 - 1
# Pick n sufficiently big
n = 1
while n < n1 + n2 - 1:
n *= 2
# Pad with extra 0s to reach length n
P = P + [0] * (n - n1)
Q = Q + [0] * (n - n2)
# Pick non-zero c arbitrarily =)
c = 123.24
# Calculate P*Q mod x^n - c
PQ = fast_polymult_mod(P, Q, n, c)
# Remove extra 0 padding and return
return PQ[:res_len]
(Advanced) Speeding up the algorithm
This section will be about tricks that can be used to speed up the algorithm. This will in total speed it up by a factor of between 2 and 4.
$n$ doesn't actually need to be a power of 2 Imaginary-cyclic convolution Trick to go from $\% (x^n - c)$ to $\% (x^n - 1)$ This algorithm is actually FFT in disguise. But it is also different compared to any other FFT algorithm that I've seen before (for example the Cooley–Tukey FFT algorithm).
Using this algorithm to calculate FFT This algorithm is not the same algorithm as Cooley–Tukey (Advanced) Connection between this algorithm and NTT
Just like how there is FFT and NTT, there are two variants of this algorithm too. One using complex floating point numbers, and the other using modulo a prime (or more generally modulo an odd composite number).
Using modulo integers instead of complex numbersThe algorithm requires two things. Firstly it needs to be possible to divide by 2, and secondly sqrt(c) needs to exist. This means that it is possible to extend the algorithm to work modulo a prime (or modulo an odd composite number) instead of using complex numbers.
What if $sqrt(c)$ doesn't exist?One of the things I dislike about NTT is that for NTT to be defined, there needs to exists a n-th root of unity. Usually problems involving NTT are designed so that this is never an issue. But if you want to use NTT where it hasn't been designed to magically work, then this is a really big issue. The NTT can become undefined!
Note that this algorithm does not exactly share the same drawback of being undefined. The reason for this is that if sqrt(c) doesn't exist, then the algorithm can simply choose to either switch over to a O(n^2) polynomial multiplication algorithm, or fall back to an unmodded O(n log n) polynomial multiplication algorithm. The implications from this is that this algorithm can do fast polynomial multiplication even if it is given a relatively bad NTT prime. I just find this property to be really cool!