Kapt's blog

By Kapt, history, 4 weeks ago, In English

So, about three weeks ago a fast ($$$\mathcal{O}(n\log^2(n))$$$) algorithm for composition and compositional inverse of formal power series has been published (and discovered some time earlier) by noshi91, here is the original blog (in Japanese). Previous best algorithm was $$$\mathcal{O}((n\log n)^{3/2})$$$. I hope someone more competent than myself will make an in-deapth blog about this later, I'll link it here as soon as I see it. For now I'll give a "quick" retelling of the algorithm.

UPD: here is a blog that focuses on implementing polynomial composition. It also uses a somewhat different approach, that doesn't explicitly use Tellegen's principle.

UPD 2: Article by noshi91 and Elegia

UPD 3: VERY in-deapth blog by maspy, with great explanation of Tellegen's (transposition) principle as well, definitely recommend reading this one.

Bostan-Mori algorithm

(Summary of section 2.1 of this article).

We want to compute $$$[x^N]\frac{P(x)}{Q(x)}$$$. To do that we will use an observation that for any polynomial $$$Q(x)$$$ the polynomial $$$Q(x)Q(-x)$$$ is even, so it is equal so $$$V(x^2)$$$ for some $$$V$$$. Also let $$$U(x)=P(x)Q(-x), U(x)=U_e(x^2)+xU_o(x^2)$$$.

$$$[x^N]\dfrac{P(x)}{Q(x)}=[x^N]\dfrac{P(x)Q(-x)}{Q(x)Q(-x)}=[x^N]\dfrac{U_e(x^2)+xU_o(x^2)}{V(x^2)}=[x^N]\left( \dfrac{U_e}{V}(x^2)+x\dfrac{U_o}{V}(x^2) \right)$$$

So if $$$N$$$ is even $$$[x^N]\dfrac{P(x)}{Q(x)} = [x^{\frac{N}{2}}]\dfrac{U_e(x)}{V(x)}$$$ and if $$$N$$$ is odd $$$[x^N]\dfrac{P(x)}{Q(x)}=[x^{\frac{N-1}{2}}]\dfrac{U_o(x)}{V(x)}$$$. Note that $$$deg(V)=deg(Q)$$$ and $$$deg(U_i) \approx \dfrac{deg(P)+deg(Q)}{2}$$$, so if initially $$$deg(P) \leqslant n$$$ and $$$deg(Q) \leqslant n$$$, those inequalities still hold after we make $$$P=U_i, Q=V, N = \left[\frac{N}{2} \right]$$$. So each step can be made in $$$2$$$ multiplications of polynomials of degree $$$\leqslant n$$$ and there are $$$\log(N)$$$ steps untill we have $$$N=0$$$ and the task is trivial, so the algorithm works in $$$\mathcal{O}(n\log n \log N)$$$.

Application to bivariate polynomials

It turns out that this algorithm works great with some bivariate polynomials as well.

Now we want to find $$$[x^k]\frac{P(x, y)}{Q(x, y)}$$$ where $$$deg_y(P), deg_y(Q) \leqslant 1$$$. Note that we don't need any coeffitients with $$$[x^{>k}]$$$ in $$$P$$$ and $$$Q$$$, so we can assume $$$deg_x(P), deg_x(Q) \leqslant k$$$. Now if we make one step of Bostan-Mori algorithm we will double limits on degrees of $$$P$$$ and $$$Q$$$ in respect to $$$y$$$ and halve $$$k$$$, so after we take $$$P$$$ and $$$Q$$$ modulo $$$x^{k+1}$$$ again, we'll halve their degrees in respect to $$$x$$$. So the product of degrees in respect to $$$x$$$ and $$$y$$$ stays the same and we'll be able to find $$$U$$$ and $$$V$$$ on each step in $$$\mathcal{O}(k\log k)$$$ and solve the problem in $$$\mathcal{O}(k\log^2(k))$$$ total time.

Compositional inverse

From the "Coefficient of fixed $$$x^k$$$ in $$$f(x)^i$$$ " example of Lagrange inversion theorem from this blog by zscoder we know a connection between $$$[x^k]\dfrac{1}{1-yf(x)}$$$ and $$$\left(\dfrac{x}{f^{(-1)}(x)}\right)^k \mod x^k$$$, where $$$f^{(-1)}(f(x))=x$$$. Specifically,

$$$[y^i]\left([x^k]\dfrac{1}{1-yf(x)} \right) = \dfrac{i}{k}[x^{k-i}] \left(\left(\dfrac{x}{f^{(-1)}(x)}\right)^k \right)$$$

so we can find either one from the other in linear time. We can now find the first value with Bostan-Mori algorithm in $$$\mathcal{O}(k\log^2(k))$$$, and getting $$$f^{(-1)}$$$ from the second value is easy (take $$$ln$$$, multiply by $$$-\frac{1}{k}$$$, take $$$exp$$$, multiply by $$$x$$$), we can now find $$$f^{(-1)} \mod x^k$$$ in $$$\mathcal{O}(k\log^2(k))$$$ time.

Composition

Finding $$$H(F(x)) \mod x^n$$$ is a linear transformation of vector of coefficients of $$$H$$$. If we transpose this transformation we'll get some linear transformation of $$$n$$$-dimansional vector $$$\overline{c}$$$ into a $$$deg(H)+1$$$-dimansional one. If $$$C(x)$$$ is the polynomial whose coeffitients correspond to coordinates of $$$\overline{c}$$$, then the transposed transformation is finding $$$[x^{n-1}]\dfrac{rev(C(x))}{1-yF(x)} \mod y^{deg(H)+1}$$$, where $$$rev(C(x))$$$ is $$$x^{n-1}C(\frac{1}{x})$$$ or just reversed $$$C$$$. This can be found in $$$\mathcal{O}(n\log^2(n))$$$ with Bostan-Mori algorithm, so by Tellegen's principle the original problem of polynomial composition can be solved in $$$\mathcal{O}(n\log^2(n))$$$ with the same constant factor as well.

  • Vote: I like it
  • +260
  • Vote: I do not like it

»
4 weeks ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by Kapt (previous revision, new revision, compare).

»
4 weeks ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by Kapt (previous revision, new revision, compare).

»
4 weeks ago, # |
  Vote: I like it +28 Vote: I do not like it

Someone wrote a blog about implementing it here. (By the way, I recommend reading other blogs by the same person too, they're pretty good).

  • »
    »
    4 weeks ago, # ^ |
      Vote: I like it +12 Vote: I do not like it

    Thanks! I added a link to it, it's approach seems easier to implement too. Why did it get so little attention though? I never thought such algorithm was even possible

    • »
      »
      »
      4 weeks ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      I'm guessing it's because the blog looks mathematical and it is not from some high-rated account. I hope someone adds it to the Catalog, though.

»
4 weeks ago, # |
  Vote: I like it -30 Vote: I do not like it

Because it is math

»
4 weeks ago, # |
Rev. 2   Vote: I like it +56 Vote: I do not like it

cuz who actually reads this stuff and is like "hmm yeah that all checks out" (no offense ofc I'm sure it is great work)

»
4 weeks ago, # |
  Vote: I like it +130 Vote: I do not like it

Maybe it's because the blog was written in Japanese (sorry about that). The paper about this algorithm has been published, and you can read it here: https://arxiv.org/abs/2404.05177 .

»
4 weeks ago, # |
  Vote: I like it +31 Vote: I do not like it

I want to convince myself to write a high-performance implementation, but I can't think of any applications outside of competitive programming. I understand that polynomial composition is useful for combinatorics and number theory, but do any end-users rely on it?

  • »
    »
    4 weeks ago, # ^ |
      Vote: I like it +7 Vote: I do not like it

    This can be used for Newton's method with arbitrary equations (not given in closed form), which might be useful for some approximation tasks (finding Taylor formula to n-th term)

  • »
    »
    4 weeks ago, # ^ |
    Rev. 2   Vote: I like it +46 Vote: I do not like it

    How can it be used for competitive programming?

  • »
    »
    4 weeks ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Yes, that’s the point of libraries like FLINT NTL and others. Contributions are very welcomed.

»
3 weeks ago, # |
Rev. 2   Vote: I like it +100 Vote: I do not like it

Sorry that we have been busy writing the paper in the past few weeks. Now let me make some more comments on this algorithm.

The previous Brent--Kung algorithm runs in time $$$O((n\log n)^{3/2})$$$, actually, this is not the best-known algorithm before our algorithm.

  • The Kedlaya--Umans algorithm runs in time $$$n\cdot 2^{(\log n)^{1/2+o(1)}}$$$. For the composition $$$f\circ g \bmod h$$$, their idea is first to flatten $$$f$$$ into a multivariate polynomial, then $$$f \circ (g, g^{d} \bmod h, g^{d^2} \bmod h,\dots)$$$ will naturally have a smaller degree, thus one can hope to use evaluation-interpolation to recover the polynomial. Surprisingly, the multipoint evaluation of multivariate polynomials is, in fact, not very hard when the number of variables is moderately large (i.e., the degree is moderately small).
  • There is another algorithm depending on heavy machinery of (polynomial-)matrix operations that have in time $$$O(n^\kappa)$$$, where $$$\kappa \leq 1.43$$$ under current known bounds of matrix multiplication.

The first one looks quite great, but the hidden constant makes it impractical in the current setting. Due to the impracticality of fast matrix multiplication, the second one has $$$\tilde O(n^{5/3})$$$ time when replacing the matrix multiplication with brute force.

So Brent--Kung is still the best algorithm in practice before the new algorithm, and it is implemented in the FLINT library (see the documentation).

According to the practical experiments in library checker, it's quite clear that it's time to introduce the new algorithm to the maintainer of number theoretic computation libraries. :)

  • »
    »
    3 weeks ago, # ^ |
    Rev. 4   Vote: I like it +68 Vote: I do not like it

    An alternative way to interpret the algorithm is to consider the following process:

    • Our goal is to compute $$$f(y) \bmod (y - g(x))$$$, define $$$M_1 = y - g(x)$$$. Define $$$M_{2t}(x, y) = M_t(x,y) M_t(-x,y)$$$, then $$$M_{2t}$$$ only has degrees multiple of $$$t$$$ in $$$x$$$. On can compute $$$f \bmod M_{2^k}$$$, then $$$\bmod M_{2^{k-1}}$$$, and so on, until $$$\bmod M_1$$$.

    The obvious ways to generalize the new algorithm are the following, still within the time complexity $$$O(n\log^2 n)$$$.:

    • For $$$n$$$ is a power of two, or smooth enough, we can use the same algorithm to solve in modulo $$$x^n - a$$$.
    • For the computation of $$$[x^n] P(x, y)/Q(x, y)$$$ where $$$P, Q$$$ both have a constant degree in $$$y$$$ initially, this seems to produce algorithms for some basis conversion of some other polynomial basis with independent interest.

    Indeed, the Kedlaya--Umans algorithm and the $$$O(n^\kappa)$$$ time algorithm solve in arbitrary modulo $$$h$$$, while our algorithm only works for power series. A possible next topic is to generalize the new algorithm to arbitrary modulo $$$h$$$.

  • »
    »
    3 weeks ago, # ^ |
    Rev. 2   Vote: I like it +68 Vote: I do not like it

    So what about its application in CP?

    The first thing is that there are some problems concerning compositions and compositional inverse. For those problems, usually, we need to give an independent analysis of what differential equation it satisfies, then use such equation to produce a relaxed convolution (a.k.a. CDQ divide-conquer). Surely the relaxed convolution is theoretically faster, but I think the advantage will not be significant when the equation is complex. So now we have a quasi-optimal solution for all these kinds of problems.

    There are also several problems that I didn't know how to do efficiently, but now it is solved automatically, for example: Compute a whole column of Eulerian number $$$\langle {n\atop m} \rangle$$$ ! In the past, we can only compute a row in $$$O(n\log n)$$$ time.

    It might even be the time to check all enumerative combinatorics problems (which involve generating functions) that are considered hard in the past and see if we can solve them in a better way.