### hly1204's blog

By hly1204, history, 2 months ago,

Hi codeforces! I want to show that how to implement the algorithm from noshi91 who showed that we could modify the Bostan--Mori's algorithm to make use in FPS composition and compositional inverse.

noshi91's blog: FPS の合成と逆関数、冪乗の係数列挙 Θ(n (log(n))^2) in Japanese.

UPD: my submission. relative code from line 874 to 923.

## Compositional Inverse

The key idea is consider the bivariate formal power series

$\dfrac{1}{1-yf(x)}\in\mathbb{C}\left\lbrack\left\lbrack x,y\right\rbrack\right\rbrack$

Let $f(0)=0$, we have

\begin{aligned} \left\lbrack x^k\right\rbrack\frac{1}{1-yf(x)}=\left\lbrack x^k\right\rbrack\sum_{i\geq 0}y^if(x)^i \end{aligned}

Apply Bostan--Mori's algorithm here:

\begin{aligned} \frac{P(x,y)}{Q(x,y)}&=\frac{P(x,y)Q(-x,y)}{Q(x,y)Q(-x,y)} \\ &=\frac{U_{\text{even}}(x^2,y)+xU_{\text{odd}}(x^2,y)}{V(x^2,y)} \end{aligned}

so

$\left\lbrack x^k\right\rbrack\frac{P(x,y)}{Q(x,y)}= \begin{cases} \left\lbrack x^{k/2}\right\rbrack &\dfrac{U_{\text{even}}(x,y)}{V(x,y)},\space &\text{if }k\equiv 0\pmod{2} \\ \left\lbrack x^{(k-1)/2}\right\rbrack &\dfrac{U_{\text{odd}}(x,y)}{V(x,y)},\space &\text{if }k\equiv 1\pmod{2} \end{cases}$

The degree of $y$ doubled after each time of iteration, but the degree of $x$ halved.

$\begin{array}{ll} &\textbf{Algorithm }\operatorname{Enum-}k\operatorname{th-Term-of-Power}(f,k,n)\text{:} \\ &\textbf{Input}\text{: } f\in x\mathbb{C}\left\lbrack x\right\rbrack,k,n\in\mathbb{Z}_{\geq 0}\text{.} \\ &\textbf{Output}\text{: }\left(\left\lbrack x^k\right\rbrack f^0,\left\lbrack x^k\right\rbrack f,\left\lbrack x^k\right\rbrack f^2,\dots ,\left\lbrack x^k\right\rbrack f^{n-1}\right)\text{.} \\ 1&P(x,y)\gets 1 \\ 2&Q(x,y)\gets 1-yf \\ 3&\textbf{while }\deg_y(P)<n-1\textbf{ or }k\neq 0\textbf{ do}\text{:} \\ 4&\qquad U_0(x^2,y)+xU_1(x^2,y)\gets P(x,y)Q(-x,y) \\ 5&\qquad V(x^2,y)\gets Q(x,y)Q(-x,y) \\ 6&\qquad P\gets U_{k\bmod{2}}(x,y) \bmod{x^{\left\lfloor k/2\right\rfloor +1}} \\ 7&\qquad Q\gets V(x,y) \bmod{x^{\left\lfloor k/2\right\rfloor +1}} \\ 8&\qquad k\gets \left\lfloor k/2\right\rfloor \\ 9&\textbf{end while} \\ 10&A(y)=\sum_{j=0}^{\deg_y(P)}a_jy^j\gets \left\lbrack x^0\right\rbrack P(x,y) \\ 11&\textbf{return }(a_0,\dots ,a_{n-1}) \end{array}$

Combined with Lagrange Inversion Formula, we have the following algorithm:

$\begin{array}{ll} &\textbf{Algorithm }\operatorname{Compositional-Inverse}(f,n)\text{:} \\ &\textbf{Input}\text{: }f\in x\mathbb{C}\left\lbrack\left\lbrack x\right\rbrack\right\rbrack, f'(0)\neq 0,n\in\mathbb{Z}_{\geq 2}\text{.} \\ &\textbf{Output}\text{: }g\pmod{x^n} \text{ such that }f(g)\equiv g(f)\equiv x\pmod{x^n}\text{.} \\ 1&t\gets f'(0) \\ 2&F(x)\gets f\left(t^{-1}x\right) \\ 3&(a_0,\dots ,a_{n-1})\gets \operatorname{Enum-}k\operatorname{th-Term-of-Power}(F,n-1,n) \\ &\because \left\lbrack x^{n-1}\right\rbrack F^k=\frac{k}{n-1}\left\lbrack x^{n-1-k}\right\rbrack \left(F^{\langle -1\rangle}/x\right)^{-(n-1)} \\ &\therefore \frac{n-1}{k}a_{k}=\left\lbrack x^{n-1-k}\right\rbrack\left(F^{\langle -1\rangle}/x\right)^{-(n-1)},\space (k>0) \\ 4&G(x)\gets \sum_{k=1}^{n-1}\frac{n-1}{k}a_{k}x^{n-1-k} \\ 5&F^{\langle -1\rangle}/x\gets \left(G(x)^{1/(n-1)}\right)^{-1}\mod{x^{n-1}} \\ &\because (F^{\langle -1\rangle}/x)(0)=1 \\ &\therefore F^{\langle -1\rangle}/x=\exp\left(\frac{1}{1-n}\log G\right) \\ &\because F^{\langle -1\rangle}\circ f\circ \left(t^{-1}x\right)=x \\ &\therefore \left((t^{-1}x) \circ F^{\langle -1\rangle}\right)\circ f=\left((t^{-1}x) \circ F^{\langle -1\rangle}\right)\circ f \circ \left((t^{-1}x)\circ (tx)\right)=x \\ 6&\textbf{return }\left((t^{-1}x) \circ F^{\langle -1\rangle}\right) \end{array}$

## Composition

Given $f=\sum_{k\geq 0}f_kx^k\in\mathbb{C}\left\lbrack\left\lbrack x\right\rbrack\right\rbrack,g\in x\mathbb{C}\left\lbrack\left\lbrack x\right\rbrack\right\rbrack$ then

$f(g)=\sum_{k\geq 0}f_kg^k$

Consider again

$\dfrac{f(y^{-1})}{1-yg(x)}=\sum_{k\geq 0}(\cdots +f_ky^{-k}+\cdots)y^kg(x)^k$

Our goal is computing

$\left\lbrack y^0\right\rbrack \dfrac{f(y^{-1})}{1-yg(x)}$

Apply Bostan--Mori's algorithm here:

\begin{aligned} \frac{P(y)}{Q(x,y)}&=\frac{P(y)}{Q(x,y)Q(-x,y)}Q(-x,y) \\ &=\frac{P(y)}{V(x^2,y)}Q(-x,y) \end{aligned}

We want to compute $\dfrac{P(y)}{Q(x,y)Q(-x,y)}$ first, and we only need to track finitly many coefficients of $y^j$ for $j\leq 0$.

$\begin{array}{ll} &\textbf{Algorithm }\operatorname{Composition-Subprocedure}(P,Q,n)\text{:} \\ &\textbf{Input}\text{: }P=\sum_{0\leq j\leq n}p_jy^{-j}\in\mathbb{C}((y)),Q\in\mathbb{C}\left\lbrack\left\lbrack x\right\rbrack\right\rbrack\left\lbrack y\right\rbrack ,Q(0)=1,[x^0y^k]Q=0,(\forall k>0)\text{.} \\ &\textbf{Output}\text{: }\left\lbrack y^{\left\lbrack -\deg_y(Q)+1,0\right\rbrack}\right\rbrack\dfrac{P}{Q}\bmod{x^{n+1}}\text{.} \\ 1&d\gets \deg_y(Q)\\ 2&\textbf{if }n=0\textbf{ then return }\left(\left\lbrack y^{-d+1}\right\rbrack P,\dots ,\left\lbrack y^0\right\rbrack P\right) \\ &n=0\implies \deg_x(Q)=0 \\ &\because Q(0)=1,[x^0y^k]Q=0,(\forall k>0) \\ &\therefore Q\equiv 1\pmod{x^1},\space \left(\left\lbrack y^{-d+1}\right\rbrack P,\dots ,\left\lbrack y^0\right\rbrack P\right)=\left(\left\lbrack y^{-d+1}\right\rbrack P/Q,\dots ,\left\lbrack y^0\right\rbrack P/Q\right) \\ 3&V(x^2,y)\gets Q(x,y)Q(-x,y)\bmod{x^{n+1}} \\ 4&(t_{-2d+1},\dots ,t_0)\gets \operatorname{Composition-Subprocedure}\left(P,V(x,y),\left\lfloor n/2\right\rfloor\right) \\ 5&T(x,y)\gets \sum_{j=-2d+1}^0t_jy^j \\ 6&U(x,y)=\sum_{j=-2d+1}^d u_jy^j\gets T(x^2,y)Q(-x,y)\bmod{x^{n+1}} \\ 7&\textbf{return }\left(u_{-d+1},\dots ,u_0\right) \end{array}$

and the composition algorithm is just

$\begin{array}{ll} &\textbf{Algorithm }\operatorname{Composition}(f,g,n)\text{:} \\ &\textbf{Input}\text{: }f,g\in\mathbb{C}\left\lbrack x\right\rbrack ,n\in\mathbb{Z}_{>0}\text{.} \\ &\textbf{Output}\text{: }f(g)\bmod{x^n}\text{.} \\ 1&c\gets g(0) \\ 2&F(x)\gets f(x+c)\bmod{x^n} \\ 3&G(x)\gets (g-c)\bmod{x^n} \\ 4&(u)\gets\operatorname{Composition-Subprocedure}\left(F\left(y^{-1}\right),1-yG,n-1\right) \\ &f(g)=f\circ (x+c)\circ G=F(G) \\ 5&\textbf{return }(u) \end{array}$

One can try with composition_of_formal_power_series_large at Library Checker.

## Reference

• +241

By hly1204, history, 3 months ago,

Hi, Codeforces! I want to introduce a simple way to understand the subset convolution and a simple optimization from scratch.

Actually, I do not know this trick is well-known or just out-of-date.

If there are any mistakes, I am happy to fix them (if I am able to) and thanks for pointing out.

We know that subset convolution is equivalent to truncated multivariate polynomial multiplication like

$A(x_1,\dots ,x_n)B(x_1,\dots ,x_n)\bmod (x_1^2,\dots ,x_n^2)$

simply written

$A(\mathbf{x})B(\mathbf{x})\bmod{\mathbf{x}^2}$

we know that we could compute

$A(\mathbf{x})B(\mathbf{x})\bmod{\left(\mathbf{x}^2-\mathbf{x}\right)}$

by using the FFT-like way (Computing $A(\mathbf{x})\bmod{(\lbrace x_1,x_1-1\rbrace \times \cdots \times\lbrace x_n,x_n-1\rbrace)}$ and using Chinese remainder algorithm to restore the result. This is called the Zeta transform and the Moebius transform respectively). But the coefficients are somehow “mixed”, we can not tell the coefficients of $\mathbf{x}^2$ and $\mathbf{x}$. An idea is that we could introduce a new variable $t$ and computing $A(\mathbf{x})B(\mathbf{x})\bmod{\left(\mathbf{x}^2-\mathbf{x}t\right)}$, simply let $t\gets 0$ to get the result of subset convolution. Here is the main idea of the whole algorithm we may familar with:

First, we define $s$ for each term of $A(\mathbf{x}),B(\mathbf{x})$:

$s\left(c x_1^{d_1}x_2^{d_2}\cdots x_n^{d_n}\right):=\sum_{k=1}^nd_k$

and

$A_k(\mathbf{x})\text{ such that }s(y)=k\text{ for any term }y\text{ of }A_k(\mathbf{x})$

then we compute

$A(\mathbf{x})B(\mathbf{x})\bmod{(\mathbf{x}^2-\mathbf{x}t)}=\left(\sum_i A_i(\mathbf{x})\right)\left(\sum_i B_i(\mathbf{x})\right)\bmod{(\mathbf{x}^2-\mathbf{x}t)}$

For the $O(n^2)$ times of computation of form $A_i(\mathbf{x})B_j(\mathbf{x})\bmod{(\mathbf{x}^2-\mathbf{x}t)}$, we have an invariant that $s(y)+\deg_t(y)=i+j$ for any term $y$ of $A_i(\mathbf{x})B_j(\mathbf{x})\bmod{(\mathbf{x}^2-\mathbf{x}t)}$. Even we only do the computation of $A_i(\mathbf{x})B_j(\mathbf{x})\bmod{(\mathbf{x}^2-\mathbf{x})}$, we could restore the result of $A_i(\mathbf{x})B_j(\mathbf{x})\bmod{(\mathbf{x}^2-\mathbf{x}t)}$, and we do not use these coefficients that with variable $t$.

Another observation is that $\deg_t(A_i(\mathbf{x})B_j(\mathbf{x})\bmod{(\mathbf{x}^2-\mathbf{x}t)})\leq (i+j)/2$. For example, if we have $\left(\sum_{i+j=4}A_i(\mathbf{x})B_j(\mathbf{x})\right)+\left(\sum_{i+j=9}A_i(\mathbf{x})B_j(\mathbf{x})\right)\bmod{(\mathbf{x}^2-\mathbf{x}t)}$, it is still possible for us to tell apart. With this simple trick, we could reduce the times of Moebius transform for about a half.

This is a submission for the optimized code. I did not implement it carefully.

upd: submission reinplemented.

• +25

By hly1204, history, 2 years ago,

# A simple way to understand the transposed algo of multi-eval

A simple way to understand the transposed algorithm of multi-point evaluation of polynomials in Tellegen’s Principle into Practice.

I learned this from Bernstein's paper Scaled Remainder Trees. If there are any mistakes, please tell me, thanks! (Sorry for my poor English)

Let $\mathbb{R}((x))$ denote the ring of formal Laurent series, $\mathbb{R}\lbrack x\rbrack$ denote the ring of polynomial. $f(x)\bmod 1:=f_{-1}x^{-1}+f_{-2}x^{-2}+\cdots$.

Bernstein showed that division in $\mathbb{R}\lbrack x\rbrack$ is division in $\mathbb{R}((x^{-1}))$. Here is a concrete example. Let $A(x):=1+x+4x^2+5x^3+x^4+4x^5\in\mathbb{R}\lbrack x\rbrack$ and $B(x):=1+9x+8x^2+x^3\in\mathbb{R}\lbrack x\rbrack$, we want to compute $Q(x),R(x)\in\mathbb{R}\lbrack x\rbrack$ such that $A(x)=B(x)Q(x)+R(x)$ and $\deg R(x)\lt \deg B(x)$ (with $\deg 0=-\infty$). Let's do this in $\mathbb{R}((x^{-1}))$. First, we should find the multiplicative inverse of $B(x)$ which is

$B^{-1}(x)=x^{-3}-8x^{-4}+55x^{-5}-369x^{-6}+2465x^{-7}-16454x^{-8}+109816x^{-9}-\cdots$

and compute

$A(x)B^{-1}(x)=4x^2-31x+217-1457x^{-1}+9735x^{-2}-64983x^{-3}+433706x^{-4}-\cdots$

We could find that $Q(x)$ is exactly $A(x)B^{-1}(x)-(A(x)B^{-1}(x)\bmod 1)$, and the leading coefficient of $R(x)$ equals $[x^{-1}]A(x)B^{-1}(x)$. That's not a coincidence. Consider the simplest case

$(x-v)^{-1}=x^{-1}+vx^{-2}+v^2x^{-3}+\cdots=\sum_{n\geq 0}v^nx^{-(n+1)}$

So we have $f(v)=[x^{-1}]f/(x-v)$ cause

\begin{aligned} f(x)/(x-v)&=f_0x^{-1}+f_1+f_2x+\cdots +f_nx^{n-1}\\ &+f_0vx^{-2}+f_1vx^{-1}+f_2v+\cdots +f_nvx^{n-2}\\ &+\cdots \end{aligned}

The problem of multi-point evaluation is as follows: Given polynomial $f(x)=\sum_{i=0}^nf_ix^i$ and $v_0,v_1,\dots ,v_n$, print $f(v_0),f(v_1),\dots ,f(v_n)$.

The traditional algorithm for multi-point evaluation is that we make an almost balanced binary (sub)product tree with leaves are polynomials $p_i(x)=x-v_i$ and non-leaf nodes are the product of its children. Then we compute $f\bmod (p_1\cdots p_n)$ and then compute $f\bmod (p_1\cdots p_{n/2})$ and $f\bmod (p_{n/2+1}\cdots p_n)$ cause $(f\bmod (p_1\cdots p_n))\bmod (p_1\cdots p_{n/2})=f\bmod (p_1\cdots p_{n/2})$ (suppose $n$ is even) and then ... The transposed algorithm is that we compute $f/(p_1\cdots p_n)$ and $f/(p_1\cdots p_n)\cdot (p_{n/2+1}\cdots p_{n})=f/(p_1\cdots p_{n/2})$, finally we will have $f/p_i$ at the leaf of the tree, the coefficient of $x^{-1}$ is exactly what we want. The only remaining problem is how many terms should we track when doing the computation. It's clear that we don't care about the coefficients of $x^0,x^1,\dots$ in $f/(p_1\cdots p_n)$ cause they do nothing with the coefficient of $x^{-1},x^{-2},\dots$ when $f/(p_1\cdots p_n)$ was multiplied by a polynomial like $g_0+g_1x+g_2x^2+\cdots$. Another observation is that for $f/P\bmod 1=a_{-1}x^{-1}+a_{-2}x^{-2}+\cdots +a_{-\deg P}x^{\deg P}+\cdots$, only $a_{-1},\dots ,a_{-\deg P}$ are enough. Consider a very simple case that we have $f/(p_1p_2)\bmod 1=c_{-1}x^{-1}+c_{-2}x^{-2}+c_{-3}x^{-3}+\cdots$ and want to compute $f/p_1\bmod 1$ via multiplying $f/(p_1p_2)\bmod 1$ by $p_2$, the coefficients of $x^{-3},x^{-4},\dots$ have no effect on the coefficient of $x^{-1}$ (it's more complicated for the integer case). So we could almost halve the size of problem after one multiplication.

About how to compute $(p_1\cdots p_n)^{-1}$ in $\mathbb{R}((x^{-1}))$, we could simply use the method that we compute the multiplicative inverse of formal power series $x^{\deg(p_1\cdots p_n)}p_1(x^{-1})\cdots p_n(x^{-1})$ in $\mathbb{R}\lbrack \lbrack x\rbrack \rbrack$.

• +60

By hly1204, history, 2 years ago,

Simple C++(17) implementation for univariate formal power series computation in competitive programming.

a.hpp with main function.

Idea from fjzzq2002's blog (in Chinese). My code is about 2 times slower than fjzzq2002's, but mine is shorter and simpler.

I am glad if you would tell me about bugs or optimizations.

There are several mistakes in the comments of the code ... sorry.

## Usage

GF for Catalan numbers A000108.

#include <iostream>
#include "a.hpp"
int main() {
using lib::Mint;
using lib::FPS;
FPS<Mint<998244353>> A;
auto X = FPS<Mint<998244353>>::idm();
A.reset().set(X / (1 - A));
for (int i = 0; i <= 10; ++i) std::cout << A[i] << ' ';
}


GF for alkyl A000598.

#include <iostream>
#include "a.hpp"
int main() {
using lib::Mint;
using lib::FPS;
FPS<Mint<998244353>> A;
auto X = FPS<Mint<998244353>>::idm();
A.reset().set((A.pow(3) / 6 + A * A(X ^ 2) / 2 + A(X ^ 3) / 3) * X + 1);
for (int i = 0; i <= 10; ++i) std::cout << A[i] << ' ';
}


GF for unlabeled trees A000081.

#include <iostream>
#include "a.hpp"
int main() {
using lib::Mint;
using lib::FPS;
FPS<Mint<998244353>> A;
auto X = FPS<Mint<998244353>>::idm();
A.reset().set(A.Exp() * X);
auto B = A - (A * A - A(X ^ 2)) / 2;
for (int i = 0; i <= 10; ++i) std::cout << B[i] << ' ';
}


Apply Euler transform several times A144036.

#include <iostream>
#include "a.hpp"
int main() {
using lib::Mint;
using lib::FPS;
FPS<Mint<998244353>> A;
auto X = FPS<Mint<998244353>>::idm();
A.reset().set((((A.Exp() - 1).Exp() - 1).Exp() - 1).Exp() * X);
for (int i = 0; i <= 10; ++i) std::cout << A[i] << ' ';
}


## Reference

• +40

By hly1204, history, 2 years ago,

Suppose $p$ is an odd prime and $a$ is a quadratic residue modulo $p$. Cipolla's algorithm shows that $b:=x^{(p+1)/2}\bmod{(x^2-tx+a)}$ such that $b^2=a$ for some irreducible polynomial $x^2-tx+a\in\mathbb{F}_p[x]$.

I don't know how to prove this properly. If there are any mistakes or typos, please let me know, thanks! Let $\alpha$ be a zero of $x^2-tx+a$, $\mathbb{F}_p(\alpha):=\lbrace a_0+a_1\alpha :a_0,a_1\in\mathbb{F}_p\rbrace$, and let $\beta$ be a zero of $x^2-(t^2-4a)$, $\mathbb{F}_p(\beta):=\lbrace a_0+a_1\beta :a_0,a_1\in\mathbb{F}_p\rbrace$. We may easily find homomorphisms between $\mathbb{F}_p(\alpha)$ and $\mathbb{F}_p(\beta)$. So I think we just need to show that $((t+\beta )/2)^{p+1}=a$.

A lot of blogs show that $(t+\beta)^p=t-\beta$ according to binomial theorem and Fermat's little theorem, so

\begin{aligned} ((t+\beta)/2)^{p+1}&=(t+\beta)(t-\beta)/4\\ &=(t^2-\beta^2)/4\\ &=a \end{aligned}

Bostan and Mori's paper shows that the computation of $x^n$ modulo a monic polynomial sometimes is equivalent to the computation of one term of a rational function $P(x)/Q(x)$ where $P,Q$ are both polynomials and $\deg(P(x))\lt \deg(Q(x))$.

We have

$b=\left[x^{(p+1)/2}\right]\dfrac{1-tx}{1-tx+ax^2}$

and

$\left[x^n\right]\dfrac{k_0+k_1x}{1+k_2x+k_3x^2}= \begin{cases} \left[x^{(n-1)/2}\right]\dfrac{k_1-k_0k_2+k_1k_3x}{1+(2k_3-k_2^2)x+k_3^2x^2}&\text{if }n\bmod 2=1\\ \left[x^{n/2}\right]\dfrac{k_0+(k_0k_3-k_1k_2)x}{1+(2k_3-k_2^2)x+k_3^2x^2}&\text{otherwise} \end{cases}$

This may save some multiplications.

submission(Library Checker)

Actually, the initial value of $k_1$ does no matter to the result. So we could mainly focus on the computation of one term of the inverse of formal power series. I don't know if we could do better.

• +112