Hello everyone!

I'd like to share a simple algorithm for computing the Frobenius form (or called rational canonical form) of a matrix. Before stating what is the algorithm, let me first give some applications.

## Matrix Exponentiation

We know that one can compute the matrix exponential of a matrix $$$A^K$$$ in $$$O(N^3 \log K)$$$ time, via binary exponentiation.

If one knows how to compute the characteristic polynomial $$$p(T)$$$ of $$$A$$$ in $$$O(N^3)$$$ time, then one can first use the Cayley–Hamilton theorem to reduce the problem into computing

where $$$r_i$$$ is the coefficient of

This reduces the time to $$$O(N^4 + N^2\log K)$$$, or even $$$O(N^4 + N\log N\log K)$$$ if you use FFT to speed up the computation of $$$T^K \bmod p(T)$$$.

Some sqrt tricks can further reduce the time to $$$O(N^{3.5} + N\log N \log K)$$$, but it is still not good enough.

When you attend some linear algebra course, you may learn that, if one can find the Jordan form of $$$A$$$, then one can compute $$$A^K$$$ faster. However, the Jordan form is usually not a good choice for computation, because it at least needs to find the root of the characteristic polynomial, which in general needs to consider the field extension.

The Frobenius form (or called rational canonical form) is a better choice. It states that any matrix $$$A$$$ can be uniquely written as $$$A = Q F Q^{-1}$$$, where $$$F$$$ has the form

where $$$C_i$$$ is the companion matrix of polynomial $$$p_i(T)$$$, satisfying $$$p_k \mid \cdots \mid p_2 \mid p_1$$$.

Note that the Frobenius form $$$F$$$ is very sparse (only contains $$$< 2N$$$ non-zero entries), then we can compute $$$A^K$$$

where can be computed in $$$O(N^3 + N\log N \log K)$$$ time.

## The Algorithm

To state the algorithm, we first consider a proof of the Cayley–Hamilton theorem that directly gives an algorithm to compute the characteristic polynomial.

Let $$$A$$$ be a matrix, we first pick an arbitrary nonzero vector $$$v$$$, and consider the sequence $$$v, Av, A^2 v, \dots$$$, these vectors are linearly independent until some point, say $$$A^k v$$$, then we have a relation

Let $$$p(T) = T^k - \sum_{i=0}^{k-1} c_i T^i$$$, we have $$$p(A)v = 0$$$.

We call the above vector $$$v_1$$$, say $$$V_1$$$ be the span of $$${v_1,\dots, A^{k-1} v_1}$$$, and then pick another vector $$$v_2$$$ such that $$$v_2$$$ is not spanned by $$$v_1, Av_1, \dots, A^{k-1}v_1$$$, then consider the sequence $$$v_1,\dots, A^{k-1}v_1, v_2, Av_2, A^2 v_2, \dots$$$, we can find another polynomial $$$p_2(T)$$$ such that $$$p_2(A)v_2$$$ is in $$$V_1$$$. Then we take $$$V_2$$$ to be the span of $$${v_1,\dots, A^{k-1}v_1, v_2, \dots, A^{k-1}v_2}$$$, and we repeat such process until

where $$$V$$$ is the whole space. Let $$$\beta = {v_1,Av_1,\dots,}$$$ be the basis.

Under our chosen basis, the matrix looks like

where $$$C_i$$$ is the companion matrix of $$$p_i$$$.

Then for each vector $$$v$$$, we show that $$$p_1(A)\cdots p_\ell(A) v = 0$$$. First consider $$$p_\ell(A)v$$$, this eliminates all components of the basis elements given by $$$v_\ell$$$, so $$$p_\ell(A) \in V_{\ell - 1}$$$, then we can inductively show $$$p_i(A) \cdots p_\ell(A) v \subset V_{i-1}$$$, in conclusion, we have $$$p_1(A)\cdots p_\ell(A) v = 0$$$. Note that by direct computation, we have $$$p(T) = \det(T I - A) = p_1(T) \cdots p_\ell(T)$$$, so the characteristic polynomial annihilates all vectors, the Cayley–Hamilton theorem is proved.

Note that there is a difference that outside the diagonal blocks, there are still some terms. For each $$$A^j v_i$$$, if $$$j$$$ is not the largest one, then $$$A(A^j v_i)$$$ takes the vector to $$$A^{j+1} v_i$$$, so this columns only contains one $$$1$$$ in $$$[A]_\beta$$$. The only problem is that the column for each maximal $$$j$$$ such that $$$A^j v_i \in\beta$$$ may contain some other terms (this corresponds to the $$$*$$$ in $$$0|*$$$).

A simple randomized strategy may help us to remedy this problem. What if we randomly choose a vector $$$v_i$$$ each time? Actually, it can be shown that, when the base field is $$$\mathbb F_q$$$, we have probability $$$\geq 1 - O(N / q)$$$ that $$$p_1, \dots , p_\ell$$$ are exactly those polynomials in the rational canonical form! (They are also called the elementary divisors when we give $$$V$$$ the $$$\mathbb F_q[T]$$$-module structure by $$$f(T) v = f(A) v$$$, and use the structure theorem of finitely generated module over PID.)

So what happens when we are guaranteed that $$$p_\ell \mid \cdots \mid p_1$$$? We first look at what happened at $$$v_2$$$.

Firstly, we have

and since $$$p_1$$$ is the minimal polynomial $$$A$$$ of $$$v_1$$$, let $$$q(T) = p_1(T) / p_2(T)$$$, we have

since $$$v_1$$$ is only annihilated by a multiple of $$$p_1(T)$$$, we have $$$p_1 \mid q r = p_1 r / p_2$$$, thus we have $$$p_2 \mid r$$$. This shows that we can have some $$$s = r / p_2$$$, then consider

we would have

This really looks like the "orthogonalization" step as what was done in the Gram–Schmidt process!

This process can also be done at larger $$$i$$$ s. Now we can state the algorithm.

**[Datas]** Maintain a basis $$$B$$$ of vectors $$${ A^j v_i }$$$ for $$$0\leq j < d_i$$$, where $$$d_i$$$ is the degree of the elementary divisor $$$p_i$$$. Initially $$$i=0$$$.

**[Initial vector]**Increase $$$i$$$ by $$$1$$$. Uniformly pick some random vector $$$v_i$$$.**[Iteration]**Take $$$B' = B$$$, repeatedly insert $$$B'$$$ with vectors $$$v_i, Av_i, \dots$$$ until finds a relation $$$p_i(A) v_i = \sum_{j < i} g_{ji}(A) v_j$$$.**[Orthogonalization]**For each $$$j < i$$$, compute $$$s_{ji} = g_{ji} / p_i$$$, then take $$$v_i \gets v_i - \sum_j s_{ji}(A) v_j$$$.**[Update]**For the new orthogonalized vector $$$v_i$$$, insert $$$B$$$ with vectors $$$v_i, Av_i, \dots, A^{d_i - 1} v_i$$$.**[Finish]**If $$$B$$$ has $$$n$$$ vectors, halt, otherwise go to 1.

After a careful implementation, the algorithm can be implemented in $$$O(N^3)$$$ time, and the probability that the algorithm fails is $$$O(N / q)$$$, where $$$q$$$ is the size of the base field.

Although the ideas behind need some further analysis (for example I didn't prove the probability of success), but the final algorithm is quite clean. I wrote a code for this algorithm, for the Pow of Matrix problem on Library Checker. You can refer to this to have a more precise understanding of how this algorithm works.