Polynomial Algorithms From A Little Higher Viewpoint: Part I Fast Fourier Transform

№	Пользователь	Рейтинг
1	Benq	3792
2	VivaciousAubergine	3647
3	Kevin114514	3603
4	jiangly	3583
5	turmax	3559
6	tourist	3541
7	strapple	3515
8	ksun48	3461
9	dXqwq	3436
10	Otomachi_Una	3413

№	Пользователь	Вклад
1	Qingyu	157
2	adamant	153
3	Um_nik	147
4	Proof_by_QED	146
5	Dominater069	145
6	errorgorn	141
7	cry	139
8	YuukiS	135
9	TheScrasse	134
10	chromate00	133

1 Fast Fourier Transform

1.1 General theory

The fast Fourier transform rests on a key observation:

Theorem (Maschke).
Let $$$G$$$ be a finite group and $$$\Bbbk$$$ a field such that the image of $$$|G|$$$ in $$$\Bbbk$$$ is non-zero.
Then every $$$\Bbbk$$$-representation of $$$G$$$ is semisimple.

Proof.
Because $$$\Bbbk$$$ is a field, it suffices to show that every subrepresentation is a direct summand.
Let $$$V$$$ be a representation of $$$G$$$ and $$$U\subseteq V$$$ a subrepresentation.
Choose a (linear) projection $$$\pi:V\to U$$$.
Average it:

$$$ \tau(v)=\frac1{|G|}\sum_{g\in G}g^{-1}\pi(gv). $$$

One checks that $$$\tau$$$ is a $$$\Bbbk$$$-homomorphism, hence $$$\ker\tau$$$ is a subrepresentation of $$$V$$$ and $$$U$$$ is a direct summand of $$$V$$$.

Corollary.
Let $$$G$$$, $$$\Bbbk$$$ be as above and let $$${U_i}$$$ be the irreducible representations of $$$G$$$. Then

$$$ |G|=\sum_i(\dim U_i)^2. $$$

Proof.
The group algebra $$$\Bbbk[G]$$$ carries the regular representation $$$G\to\mathrm{End}_{\Bbbk}\Bbbk[G]$$$.
All irreducible representations occur in $$$\Bbbk[G]$$$, and by semisimplicity the regular representation is their direct sum. Thus

$$$ \mathrm{End}_{\Bbbk}\Bbbk[G]\simeq\bigoplus_i\mathrm{End}_{\Bbbk}U_i, $$$

and comparing dimensions gives the claim.

Hence, replacing an element $$$x\in\Bbbk[G]$$$ by its values on all irreducible representations lowers the complexity of multiplication in $$$\Bbbk[G]$$$ from $$$\Theta(|G|^2)$$$ to

$$$ \Theta\!\Bigl(\sum(\dim U_i)^\omega\Bigr). $$$

This replacement is the Fourier transform, written $$$x\mapsto\hat x$$$.
A naïve Fourier transform costs $$$\Theta(|G|^2)$$$, so we look for faster ways.

For a function $$$f:G\to\Bbbk$$$ define

$$$ \hat f(\rho)=\sum_{g\in G}f(g)\,\rho(g), $$$

and its inverse transform by

$$$ f(g)=\frac1{|G|}\sum_{\rho\in\mathcal R}d_\rho\, \operatorname{tr}\bigl(\hat f(\rho)\rho(g)\bigr), $$$

where $$$\mathcal R$$$ is a complete set of irreducibles and $$$d_\rho=\dim V_\rho$$$.

Take a subgroup $$$H\le G$$$ and a set of coset representatives $$$C={s_i}$$$. Then

$$$ \hat x_\rho=\sum_{s\in C}\rho(s)\,\widehat x_{\rho|_H}. $$$

However $$$\rho|_H$$$ need not be irreducible, so we block-diagonalise $$$\widehat x_{\rho|_H}$$$ with a suitable basis $$$\mathfrak B$$$; for concrete groups we construct $$$\mathfrak B$$$ explicitly.

The running time satisfies

$$$ T(G)=[G:H]\;T(H)+[G:H]^{\;\omega}\sum_{\rho\in\mathrm{Irr}(G)}(\dim\rho)^{\omega}. $$$

After a little manipulation,

$$$ T(G)\le [G:H]\,T(H)+[G:H]^{\;\omega}|H|\, \max_{\rho\in\mathrm{Irr}(G)}(\dim\rho)^{\;\omega-2}. $$$

Because $$$\dim\rho\le\sqrt{|G|}$$$ one gets

$$$ T(G)\le [G:H]\,T(H)+[G:H]^{\;\omega}|H|^{\omega/2}. $$$

Theorem (Lev).
Unless $$$G\simeq\Bbb Z/p\Bbb Z$$$, the group $$$G$$$ has a proper subgroup of order at least $$$\sqrt{|G|}$$$.

Proof.
Lev’s proof uses the classification of finite simple groups, and we omit it.

Taking $$$|H|=\sqrt{|G|}$$$ yields

$$$ T(n)\le\sqrt n\,T(\sqrt n)+n^{3\omega/4}, \qquad\text{so }T(n)\sim n^{3\omega/4}. $$$

This beats the trivial bound only if $$$\omega\le\frac83$$$.
In practice $$$|H|=\sqrt{|G|}$$$ is too large.
If every subgroup $$$H$$$ of $$$G$$$ has a proper subgroup of size $$$|H|^\theta$$$, then

$$$ T(n)\le n^{1-\theta}\,T(n^\theta)+n^{\omega(1-\theta/2)}, \quad\text{so }T(n)\sim n^{\omega(1-\theta/2)}. $$$

With $$$\omega=3$$$ this is advantageous once $$$\theta\ge\frac23$$$, which is typical for most groups.

1.2 Generic fast Fourier transform

Assume $$$\Bbbk$$$ is algebraically closed of characteristic 0 (or large enough).

If $$$G$$$ embeds in $$$\Bbbk^\times$$$, all irreducible representations are one-dimensional.
For $$$G=\Bbb Z/n\Bbb Z$$$ one has

$$$ \Bbbk[\Bbb Z/n\Bbb Z]=\Bbbk[X]/(X^n-1)\simeq \prod_{z^n=1}\Bbbk[X]/(X-z) $$$

by the Chinese remainder theorem.

Since every representation is 1-dimensional, restrictions stay irreducible, so no basis change is needed.
Thus

$$$ \hat x_\rho=\sum_{s\in C}\rho(s)\,\hat x_{\rho|_H}. $$$

For $$$G=\Bbb Z/n\Bbb Z$$$ and the character $$$\rho(g)=e^{2\pi i g /n}$$$,

$$$ \hat x(g)=\sum_{0\le k \lt m}e^{2\pi i gk/n}\, \hat x\!\bigl(g+(n/m)\Bbb Z\bigr)\qquad(n\in m\Bbb Z), $$$

and the cost obeys

$$$ T(n)=m\,T(n/m)+mn. $$$

In practice one works in $$$\Bbbk[X]\bmod(X^n-1)$$$, embeds $$$\Bbbk$$$ in a large $$$\Bbb Z/N\Bbb Z$$$ with $$$N \gt 2n$$$, often taking $$$N=2^k$$$, giving

$$$ T(n)=\Theta(n\log n), $$$

the classical fast Fourier transform.
For $$$N=2^k$$$ let $$$x=x_{k-1}\dots x_1x_0$$$ be the binary expansion of $$$x\in\Bbb Z/N\Bbb Z$$$.
Rewrite

$$$ x=x_0+2\bigl(x_1+2(\dots+x_{k-1}+2\Bbb Z)\dots\bigr), $$$

so the processing order is

$$$ \sum_{i=0}^{k-1}x_i\,2^{k-1-i}\;=\;\operatorname{rev}(x). $$$

Swapping the $$$x$$$-th and $$$\operatorname{rev}(x)$$$-th values and then performing butterflies for each $$$0\le j \lt k$$$ yields the radix-2 FFT.

1.3 Fast Fourier transform over small-characteristic finite fields

When $$$p-1$$$ is smooth, Section 1.2 suffices.
We now outline the additive Fourier transform on $$$\Bbb F_{p^{p^k}}$$$ due to Cantor and successors.
Write $$$\mathbb K=\Bbb F_{p^{p^k}}$$$, $$$\mathbb F=\Bbb F_p$$$.

Definition.
The canonical $$$\mathbb F$$$-linear map $$$\wp:\mathbb K\to\mathbb K$$$, $$$\wp(x)=x^p-x$$$, is the Artin–Schreier map.
Its $$$i$$$-fold composition is $$$s_i$$$.

Proposition.
The zero set of $$$s_i$$$ is an $$$\mathbb F$$$-vector space.
It is a subfield of $$$\mathbb K$$$ iff $$$i=p^j$$$.

Proof.
Linearity is clear.
If $$$i=p^j$$$, then $$$s_i(x)=x^{p^i}-x$$$ and $$$\mathbb K$$$ contains exactly one subfield of size $$$p^{p^j}$$$; conversely only these $$$i$$$ give subfields.

Set $$$W_i=\ker s_i$$$.
For a polynomial $$$p$$$ of degree $$$ \lt p^i$$$, its additive Fourier transform $$$\mathcal A(p)$$$ is the list of values $$$p(\omega)$$$ with $$$\omega\in W_i$$$ in a fixed order.

Definition.
A Cantor generator sequence $$${g_j}$$$ satisfies

$$$ \wp(g_i)=\prod_{0\le j \lt i}g_j^{p-1}+f(g_0,\dots,g_{i-1}), $$$

where $$$\deg f \lt i(p-1)$$$ and $$$g_i$$$ has no term of degree $$$\ge p$$$.
It yields the Cantor basis

$$$ i=\sum_{k\ge0}i_kp^k\;\Longrightarrow\; b_i=\sum_{k\ge0}g_k^{\,i_k}. $$$

Theorem.
There is $$$\lambda\in\mathbb F$$$ such that $$$s_j(b_k)-\lambda b_{k-j}\in W_{k-j}$$$.
In particular $$$s_k(b_k)=1$$$ and $$$W_k=\mathrm{span}{b_0,\dots,b_{k-1}}$$$.

Cantor constructs generators with $$$\wp(g_0)=0$$$ and
$$$\wp(g_{k+1})=g_k^{\,M}$$$, where

$$$ M=\begin{cases} 1,&p=2,\\[4pt] 2p-1,&p\ne2. \end{cases} $$$

This gives a Cantor basis and a degree-$$$p^k$$$ irreducible polynomial.

Using the basis $$$X_i$$$ obtained from the $$$g_k$$$, Cantor shows how to convert between the coefficients of a degree $$$ \lt p^D$$$ polynomial and its coordinates in the $$$X_i$$$ basis in

$$$ \Theta(Dp^D\log D\,\mathrm{poly}(p)) $$$

versus the naïve $$$\Theta(D^2p^D\mathrm{poly}(p))$$$.

With this, one derives a divide-and-conquer additive FFT similar to the classical case.
For instance FFTs in the Nim-product field $$$\Bbb F_{2^{64}}$$$ often use Cantor’s method.

1.4 Frobenius-accelerated Fourier transform

Let $$$\sigma:\alpha\mapsto\alpha^{p^m}$$$ be the $$$m$$$-th Frobenius automorphism.

Two elements $$$\alpha,\beta\in\mathbb K$$$ are conjugate if $$$\sigma(\alpha)=\beta$$$ for some $$$\sigma\in\mathrm{Gal}(\mathbb K/\mathbb F)$$$.

Proposition.
If $$$\alpha\in W_k\setminus W_{k-1}$$$, the number of conjugates of $$$\alpha$$$ is $$$p^{\lceil\log_p k\rceil}$$$.

Proof.
$$$\mathbb F(\alpha)=\mathbb F(g_0,\dots,g_{\lceil\log_p k\rceil-1})$$$, so the stabiliser of $$$\alpha$$$ is generated by the $$$p^{\lceil\log_p k\rceil}$$$-th Frobenius; orbit-stabiliser finishes the count.

These conjugates are reached by Frobenius maps of orders $$$p^0,\dots,p^{\lceil\log_pk\rceil-1}$$$, each modifying a specific coordinate in the Cantor basis.
Thus one representative set is

$$$ \Sigma=\{x\in W_m\mid\text{its coordinates at positions }p^k\text{ vanish}\}. $$$

Computing the transform only for $$$\Sigma$$$ and propagating the result along Frobenius orbits reduces time by roughly a factor $$$p/d$$$ with $$$d=\lceil\log_p n\rceil$$$.
See the paper arXiv:1802.03932 and the linked code repository for details; the original idea appears in the “Frobenius FFT” paper cited.

For $$$\Bbb F_{2^{64}}$$$ with Cantor basis $$$1,2,4,8,\dots$$$, the Cantor and Frobenius FFTs give efficient Nim-product multiplication.

1.5 Arbitrary rings

Cantor also devised a fast multiplication algorithm over a general ring $$$R$$$ (associativity can even be dropped).
Fix a radix $$$s=\Theta(1)$$$.
We would like division and roots of unity; instead we avoid division and adjoin roots.

In the Fourier inversion formula the only division is by $$$n=s^r$$$.
For polynomials $$$A,B\in R[X]/(X^n-1)$$$ we can compute $$$n \cdot [X^k]\, (A \cdot B)$$$ as follows: write $$$c$$$ for the $$$k$$$-th coefficient and $$$c'$$$ for the carry from degree $$$k+n$$$.
Running the FFT twice, once at $$$\omega_{ns}$$$ and once at $$$\omega_{ns}\cdot\omega_s$$$, gives $$$n(c+c')$$$ and $$$n(c+\omega_sc')$$$.
Solving the two linear equations needs only ring operations provided some $$$\chi\in R$$$ satisfies $$$(1-\omega_s)\chi\in\Bbb Z\setminus{0}$$$; the limit

$$$ \lim_{q\to1}\Phi_n(q)= \begin{cases} p,&n=p^k,\\ 1,&\text{otherwise} \end{cases} $$$

shows such a $$$\chi$$$ exists.
Hence we recover $$$c$$$ by Bézout’s identity.

To obtain roots of unity we extend $$$R$$$ to $$$\overline R=R[\omega_n]=R[X]/(\Phi_n)$$$.
Multiplication by $$$\omega_n$$$ becomes a cyclic shift, so DFT uses only additions and shifts.
Recursively multiplying the auxiliary integers yields

$$$ T(n)=\sqrt n\,T(\sqrt n)+n\log n \quad\Rightarrow\quad T(n)=\Theta(n\log n\log\log n). $$$

Another classic of this flavour is the Toom–Cook method, which evaluates at $$$2k+1$$$ points and satisfies

$$$ T(n)=(2k+1)\,T\!\bigl(n/(k+1)\bigr)+n\,k^{O(1)} \quad\Longrightarrow\quad T(n)\sim n\,2^{O(\sqrt{\log n})}. $$$

1.6 Symmetric groups

We quote the structure theorem for irreducibles of $$$S_n$$$ without proof.
Write a Young diagram as a sequence $$$r_1\ge\dots\ge r_n$$$ and a Young tableau by filling it with numbers.
For a tableau let $$$c(i)$$$ be the row index minus column index of the entry $$$i$$$.

Theorem.
Let $$$\lambda$$$ be a Young diagram and $$$\rho_\lambda$$$ its irreducible representation.
For the transposition $$$\sigma=(k\;k+1)$$$ the matrix $$$\rho_\lambda(\sigma)$$$ has rows and columns indexed by standard tableaux of shape $$$\lambda$$$.
Set $$$R=1$$$ if applying $$$\sigma$$$ to tableau $$$t$$$ gives another standard tableau and $$$R=0$$$ otherwise. Then

$$$ a_{t,t}=\frac1{c(k+1)-c(k)},\qquad a_{\sigma(t),t}=\sqrt{\,1-\bigl(\tfrac1{c(k+1)-c(k)}\bigr)^2}. $$$

Clausen designed an FFT for $$$S_n$$$.
He maps $$$\sigma\in S_n$$$ to the permutation obtained by deleting $$$n$$$; in cycle notation this is $$$(\sigma_n\;\sigma_{n+1}\;\dots\;n)$$$.
In the general algorithm of 1.1 one needs a basis in which restriction from $$$S_n$$$ to $$$S_{n-1}$$$ becomes block diagonal.
Clausen observed that this basis is inherited, i.e. the change-of-basis matrix $$$P$$$ is the identity.

Sketch.
A node $$$n$$$ can occupy only those cells whose right and bottom neighbours are empty.
All tableaux with $$$n$$$ in the same position form a class.
For $$$\sigma\in S_{n-1}$$$ the position of $$$n$$$ stays fixed, so $$$\rho_\lambda(\sigma)$$$ is block diagonal, each block being an irreducible of $$$S_{n-1}$$$.

For sparse inputs one may shrink the domain via double cosets such as

$$$ S_n\;\to\;(p\;p+1\;\dots\;n)\,S_{n-1}\,(q\;q+1\;\dots\;n), $$$

gaining freedom to handle parts without FFT.
The library $$$\mathbb S_n\text{ob}$$$ is a classic implementation; see also $$$\mathbb S_n\text{FFT}$$$ for machine-learning applications.

Regarding preference rankings encoded as permutations, define the acceptance between $$$\sigma,\tau$$$ by

$$$ f(\sigma,\tau)=\frac{\exp\bigl(-\theta\,d(\sigma,\tau)\bigr)}Z, $$$

let $$$f_\sigma$$$ be the corresponding vector in $$$\Bbb R^{n!}$$$, and consider the clustering problem of partitioning $$$m$$$ such rankings into $$$k$$$ classes minimising

$$$ \sum_{i=1}^k\frac1m\sum_{x,y\in C_i}\|f_{\sigma_x}-f_{\sigma_y}\|. $$$

Using Parseval on finite groups:

Theorem (Parseval).
For $$$f:G\to\Bbb C$$$ and unitary irreducibles $$$\mathcal R$$$,

$$$ |G|\|f\|_2^2=\sum_{\pi\in\mathcal R}d_\pi\|\hat f(\pi)\|_{\mathrm{HS}}^2, $$$

where $$$|\cdot|_{\mathrm{HS}}$$$ is the Hilbert–Schmidt norm.

Proof.
By Peter–Weyl, $$$L^2(G)$$$ splits orthogonally into $$$\sqrt{d_\pi}\,\pi$$$.
Writing $$$f$$$ in this basis and computing coefficients gives

$$$ \langle f,e(\pi,i,j)\rangle=\sqrt{\frac{d_\pi}{|G|}}\, \overline{\hat f(\pi)_{ij}}, $$$

hence

$$$ \|f\|_2^2=\frac1{|G|}\sum_{\pi}d_\pi\|\hat f(\pi)\|_{\mathrm{HS}}^2. $$$

Thus the clustering objective equals

$$$ \frac1{n!}\sum_{\pi}d_\pi\sum_{i=1}^k \sum_{x,y\in C_i}\|\hat f_x(\pi)-\hat f_y(\pi)\|_{\mathrm{HS}}^2, $$$

and in practice only a few $\pi$’s suffice for a good approximation, making the computation faster.

Блог пользователя OIer1048576