MacMahon's master theorem

Hi everyone!

Mandatory orz to Elegia whose blog introduced me to MMT as an elegant approach to prove Dixon's identity.

Today, I would like to write about MacMahon's master theorem (MMT). It is a nice result on the intersection of combinatorics and linear algebra that provides an easy proof to some particularly obscure combinatorial identities, such as the Dixon's identity:

$$$ \sum\limits_k (-1)^k \binom{a+b}{a+k} \binom{b+c}{b+k} \binom{c+a}{c+k} = \binom{a+b+c}{a,b,c}. $$$

Besides that, MMT has large implications in Quantum Physics, which we will hopefully discuss. For now, let's formulate MMT.

MMT. Let $$$\mathbf A = [a_{ij}]_{n\times n}$$$, $$$\mathbf X = \operatorname{diag}(x_1,\dots,x_n)$$$, $$$\mathbf t = (t_1,\dots,t_n)$$$, $$$\mathbf x = (x_1,\dots,x_n)$$$ and $$$\mathbf k = (k_1,\dots,k_n) \geq 0$$$. Then,

$$$ \boxed{[\mathbf t^\mathbf k] \prod\limits_{i=1}^n \left(\sum\limits_{j=1}^n a_{ij} t_j\right)^{k_i} = [\mathbf x^\mathbf k] \det(\mathbf I-\mathbf X \mathbf A)^{-1}} $$$

where $$$\mathbf t^\mathbf k$$$ stands for $$$t_1^{k_1} \dots t_n^{k_n}$$$, and $$$\mathbf x^\mathbf k$$$, correspondingly, for $$$x_1^{k_1} \dots x_n^{k_n}$$$, and $$$\mathbf I = [\delta_{ij}]_{n\times n}$$$ is the identity matrix.

With MMT, the Dixon's identity is proven as elegantly as this:

Proof

Let's now learn how to prove MMT itself.

Variable-less MMT

Now, let's look into MMT and try to understand what does it actually stand for? First of all, we can multiply both sides by $$$\mathbf x^\mathbf k$$$ and sum them up over all $$$\mathbf k \geq 0$$$. In this case, the RHS, by definition, will turn into $$$\det(\mathbf I - \mathbf X \mathbf A)^{-1}$$$, while the LHS will turn into

$$$ \sum\limits_{\mathbf k \geq 0} [\mathbf t^\mathbf k] \prod\limits_{i=1}^n\left(\sum\limits_{j=1}^n a_{ij} x_i t_j\right)^{k_i}. $$$

We can now do the substitution $$$\mathbf B = \mathbf X \mathbf A$$$ and use $$$b_{ij} = a_{ij} x_i$$$ to reformulate MMT in the equivalent form

$$$ \boxed{\sum\limits_{\mathbf k \geq 0} [\mathbf t^\mathbf k] \prod\limits_{i=1}^n \left(\sum\limits_{j=1}^n b_{ij} t_j\right)^{k_i} = \det(\mathbf I - \mathbf B)^{-1}} $$$

This form is more convenient for us to work with, as it doesn't depend on any variables in RHS.

LHS as the sum of permanents

Now, let's take a closer look on the individual summands of the LHS. What do they enumerate?

$$$ [\mathbf t^\mathbf k]\prod\limits_{i=1}^n \left(\sum\limits_{j=1}^n b_{ij} t_j\right)^{k_i} = \sum\limits_{(j_1,\dots,j_k)} \prod\limits_{s=1}^k b_{i_s j_s}, $$$

where $$$(i_1,\dots,i_k)$$$ is a tuple such that $$$1 \leq i_1 \leq \dots \leq i_k \leq n$$$ and each $$$i$$$ occurs exactly $$$k_i$$$ times in it. Correspondingly, $$$(j_1,\dots,j_k)$$$ are all possible rearrangements of such tuple. So, we go over all rearrangements of $$$(i_1,\dots,i_k)$$$ and sum up the product of $$$b_{i_s j_s}$$$ over $$$s$$$.

Doesn't it look familiar? If we consider all permutations of $$$(1,\dots,k)$$$ rather than direct rearrangements of $$$(i_1,\dots,i_k)$$$, this will just be a permanent of a $$$(k_1+\dots+k_n) \times (k_1+\dots+k_n)$$$ matrix, in which the element $$$b_{ij}$$$ occurs as a block of size $$$k_i \times k_j$$$.

This concept usually occurs in literature defined even for two distinct vectors $$$\mathbf k \geq 0$$$ and $$$\mathbf l \geq 0$$$, so that $$$b_{ij}$$$ occurs in a block of size $$$k_i \times l_j$$$. The resulting matrix is typically denoted as $$$\mathbf B^{(\mathbf k, \mathbf l)}$$$. So, e.g. for $$$n = 3$$$, $$$\mathbf k = (3, 2, 1)$$$ and $$$\mathbf l = (1, 2, 3)$$$, $$$\mathbf B^{(\mathbf k, \mathbf l)}$$$ would look like

$$$ \mathbf B = \begin{pmatrix} b_{11} & b_{12} & b_{13} \\ b_{21} & b_{22} & b_{23} \\ b_{31} & b_{32} & b_{33} \end{pmatrix} \mapsto \left( \begin{array}{c|c|c} \begin{matrix} b_{11} \\ b_{11} \\ b_{11} \end{matrix} & \begin{matrix} b_{12} & b_{12} \\ b_{12} & b_{12} \\ b_{12} & b_{12} \end{matrix} & \begin{matrix} b_{13} & b_{13} & b_{13} \\ b_{13} & b_{13} & b_{13} \\ b_{13} & b_{33} & b_{33} \end{matrix} \\ \hline \begin{matrix} b_{21} \\ b_{21} \end{matrix} & \begin{matrix} b_{22} & b_{22} \\ b_{22} & b_{22}\end{matrix} & \begin{matrix} b_{23} & b_{23} & b_{23} \\ b_{23} & b_{23} & b_{23} \end{matrix} \\ \hline \begin{matrix}b_{31}\end{matrix} & \begin{matrix} b_{32} & b_{32} \end{matrix} & \begin{matrix} b_{33} & b_{33} & b_{33} \end{matrix} \end{array} \right) = \mathbf B^{(\mathbf k, \mathbf l)}. $$$

Note that to go back to rearrangements of $$$(i_1,\dots,i_k)$$$ rather than permutations of $$$(1,\dots,k)$$$ we should divide the permanent by $$$k_1 ! \dots k_n!$$$, which is commonly denoted in literature as $$$\mathbf k!$$$. In this terms, MMT rewrites concisely as

$$$ \boxed{\sum\limits_{\mathbf k \geq 0} \frac{\operatorname{per}\mathbf B^{(\mathbf k, \mathbf k)}}{\mathbf k!} = \det(\mathbf I - \mathbf B)^{-1}} $$$

Or, in terms of the original matrix $$$\mathbf A$$$, as

$$$ \boxed{\sum\limits_{\mathbf k \geq 0} \operatorname{per}\mathbf A^{(\mathbf k, \mathbf k)} \frac{\mathbf x^\mathbf k}{\mathbf k!} = \det(\mathbf I - \mathbf X\mathbf A)^{-1}} $$$

Sums of permanents as traces of symmetric powers

Now, let's denote $$$|\mathbf k| = k_1+\dots+k_n$$$. We can show that

$$$ \sum\limits_{|\mathbf k| = k} \frac{\operatorname{per}\mathbf B^{(\mathbf k,\mathbf k)}}{\mathbf k!} = \operatorname{tr} \operatorname{S}^k(\mathbf B), $$$

where $$$\operatorname{S}^k(\mathbf B)$$$ is the so-called symmetric power of $$$\mathbf B$$$.

Let's properly define it to better understand what this means. Before doing so, we should define tensor products. Let $$$V$$$ and $$$W$$$ be vector spaces with bases $$$B_V$$$ and $$$B_W$$$ correspondingly. Then, their tensor product $$$V \otimes W$$$ is the vector space with associated bilinear operation $$$\otimes : V \times W \to V \otimes W$$$, such that the set $$$\{v \otimes w : v \in B_V, w \in B_W\}$$$ forms a basis of $$$V \otimes W$$$.

If we represent vectors in coordinate form, one possible way to represent tensor products is via Kronecker product operation applied to coordinate arrays. When tensor product is defined on vectors, it is also implicitly defined on linear maps. So, if $$$A : V_1 \to V_2$$$ and $$$B: W_1 \to W_2$$$ are linear maps, then their tensor product is the linear map $$$A \otimes B : V_1 \otimes W_1 \to V_2 \otimes W_2$$$, such that $$$(A \otimes B)(v \otimes w) = (Av)\otimes (Bw)$$$. If $$$A$$$ and $$$B$$$ are represented by matrices, $$$A \otimes B$$$ can be represented as their Kronecker product.

To define symmetric powers, we should consider $$$V \otimes V$$$, a tensor product of $$$V$$$ with itself. Assuming that $$$V$$$ has a basis $$$e_1,\dots,e_n$$$, we can say that $$$V \otimes V$$$ has a basis $$$\{e_i \otimes e_j : 1 \leq i, j \leq n\}$$$, so we can represent any $$$v \in V \otimes V$$$ as an array $$$[v_{ij}]_{n \times n}$$$ such that

$$$ v = \sum\limits_{i,j} v_{ij} (e_i \otimes e_j). $$$

Now, of all vectors $$$v \in V \otimes V$$$ we can consider a subset of symmetric vectors, that is those for which $$$v_{ij} = v_{ji}$$$ for any $$$(i,j)$$$. Such vectors form a subspace $$$\operatorname{S}^2(V)$$$ of $$$V \otimes V$$$, also called the symmetric square of $$$V$$$. As a subspace of $$$V$$$, symmetric square has a basis

$$$ \{e_{ij} = e_i \otimes e_j + e_j \otimes e_i : 1 \leq i \leq j \leq n\}. $$$

Finally, we can define the symmetric power $$$\operatorname{S}^k(V)$$$ of $$$V$$$ as a subspace of $$$V^{\otimes k} = \overbrace{V \otimes \dots \otimes V}^k$$$ formed by symmetric arrays. By this we mean that any element of $$$V^{\otimes k}$$$ can be represented as an array $$$[v_{i_1 \dots i_k}]_{n \times \dots \times n}$$$ of size $$$n^k$$$, such that

$$$ v = \sum\limits_{i_1,\dots,i_k} v_{i_1,\dots,i_k} (e_{i_1} \otimes \dots \otimes e_{i_k}). $$$

With this definition, the $$$k$$$-th symmetric power of $$$V$$$ is a subspace of $$$V^{\otimes k}$$$ formed of all vectors $$$v$$$ whose coordinates wouldn't change if we rearrange vectors $$$e_1,\dots,e_n$$$ in any way. We can also define its basis as

$$$ \left\{e_\mathbf i = \sum\limits_{(j_1,\dots,j_k)} (e_{j_1} \otimes \dots \otimes e_{j_k}) : 1 \leq i_1 \leq \dots \leq i_k \leq n\right\}, $$$

where $$$(j_1,\dots,j_k)$$$ go over all possible rearrangements of $$$(i_1,\dots,i_k)$$$. Looks familiar, doesn't it?

Now, what's even more important is that we can also define the symmetric power $$$\operatorname{S}^k(A)$$$ of any linear map $$$A : V \to V$$$ as the reduction of $$$A^{\otimes k}$$$ on $$$\operatorname{S}^k(V)$$$. Now, we can find its trace by going over all basis vectors in $$$\operatorname{S}^k(V)$$$ and summing up their contribution to the trace:

$$$ \operatorname{tr}\operatorname{S}^k(\mathbf B) = \sum\limits_{\mathbf i} \frac{e_\mathbf i \cdot \operatorname{S}^k(\mathbf B) e_\mathbf i}{e_\mathbf i \cdot e_\mathbf i}. $$$

Noting that $$$(v_1 \otimes \dots \otimes v_k) \cdot (w_1 \otimes \dots \otimes w_k) = (v_1 \cdot w_1) \dots (v_k \cdot w_k)$$$, we can see that $$$e_\mathbf i \cdot e_\mathbf i$$$ is the total number of rearrangements $$$(j_1,\dots,j_k)$$$ of $$$(i_1,\dots,i_k)$$$. This is because, when expanded, each rearrangement will only give $$$1$$$ with itself, while giving $$$0$$$ with any other. Conversely, $$$e_\mathbf i \cdot \operatorname{S}^k(\mathbf B) e_\mathbf i$$$ expands into summation over rearrangements $$$(l_1,\dots,l_k)$$$ and $$$(j_1,\dots,j_k)$$$ of $$$(i_1,\dots,i_k)$$$:

$$$ \sum\limits_{(l_1,\dots,l_k)} \sum\limits_{(j_1,\dots,j_k)} \prod\limits_{s=1}^k b_{l_s j_s}. $$$

When we divide this whole sum by $$$e_\mathbf i \cdot e_\mathbf i$$$, it is essentially equivalent to fixing $$$(l_1,\dots,l_k) = (i_1,\dots,i_k)$$$, making the sum into

$$$ \operatorname{tr}\operatorname{S}^k(\mathbf B) = \sum\limits_\mathbf i \sum\limits_{(j_1,\dots,j_k)} \prod\limits_{s=1}^k b_{i_s j_s} = \sum\limits_{|\mathbf k| = k} \frac{\operatorname{per}\mathbf B^{(\mathbf k,\mathbf k)}}{\mathbf k!}, $$$

which proves that the LHS of the MMT is the sum of $$$\operatorname{tr}\operatorname{S}^k(\mathbf B)$$$ over $$$k \geq 0$$$.

Inverse determinants as traces of symmetric powers

Now, let's take a closer look at $$$\det(\mathbf I - \mathbf B)^{-1}$$$. Let $$$\lambda_1,\dots,\lambda_n$$$ be eigenvalues of $$$\mathbf B$$$, then

$$$ \det(\mathbf I - \mathbf B)^{-1} = \prod\limits_{i=1}^n \frac{1}{1-\lambda_i} = \prod\limits_{i=1}^n \sum\limits_{k=0}^\infty \lambda_i^k = \sum\limits_{\mathbf k \geq 0} \lambda^\mathbf k. $$$

We may show that the eigenvalues of $$$\operatorname{S}^k(\mathbf B)$$$ are, in fact, all possible values of $$$\lambda^\mathbf k$$$ where $$$|\mathbf k|= k$$$. To show this, assume in the basis construction above that $$$e_1,\dots,e_n$$$ are all eigenvectors of $$$\mathbf B$$$ that also form the basis. In this case, $$$e_\mathbf i$$$ would also be an eigenvector of $$$\operatorname{S}^k(\mathbf B)$$$ with the eigenvalue $$$\lambda^\mathbf k$$$, where $$$\mathbf k = (k_1,\dots,k_n)$$$ and $$$k_i$$$ is the multiplicity of $$$i$$$ in $$$(i_1,\dots,i_k)$$$. On the other hand, the sum of all eigenvalues of a particular linear map is simply the trace of the linear map, hence

$$$ \sum\limits_{k=0}^\infty \operatorname{tr}\operatorname{S}^k(\mathbf B) = \det(\mathbf I - \mathbf B)^{-1}, $$$

concluding the proof of MacMahon's master theorem.

Applications to Quantum Physics

Consider the expression $$$\det(\mathbf I - \mathbf B)$$$ without taking its inverse:

$$$ \det(\mathbf I - \mathbf B) = \prod\limits_{i=1}^n (1-\lambda_i) = \sum\limits_{k=0}^\infty (-1)^k \sum\limits_{|\mathbf i|=k} \lambda^\mathbf i, $$$

where this time $$$\mathbf i$$$ iterates over tuples $$$(i_1,\dots,i_k)$$$ such that $$$1 \leq i_1 \lt \dots \lt i_k \leq n$$$ instead of $$$1 \leq i_1 \leq \dots \leq i_k \leq n$$$, thus ensuring that only pairwise distinct eigenvalues participate. Summed this way, eigenvalues also sum up to a trace of a certain power of $$$\mathbf B$$$. However, this time it's not symmetric power of $$$\mathbf B$$$, but instead exterior power, denoted $$$\Lambda^k(\mathbf B)$$$.

While symmetric power is a subspace of $$$V^{\otimes k}$$$ composed by symmetric tensors, exterior power is composed of antisymmetric tensors instead, so in its basis construction some rearrangements are multiplied by $$$-1$$$, if the parity of the rearrangement is odd. In this way,

$$$ \det(\mathbf I - \mathbf B) = \sum\limits_{k=0}^n (-1)^k \operatorname{tr} \Lambda^k(\mathbf B). $$$

Now, plugging $$$t \mathbf A$$$ instead of $$$\mathbf B$$$ into MMT, we get the following result:

$$$ \boxed{\frac{1}{\sum\limits_{k=0}^n (-1)^k \operatorname{tr} \Lambda^k(\mathbf A) t^k} = \sum\limits_{k=0}^\infty \operatorname{tr} \operatorname{S}^k(\mathbf A) t^k} $$$

which is also known as the boson-fermion correspondence, where bosons are commuting particles corresponding to symmetric powers, while fermions are skew-commuting particles corresponding to exterior powers.

I suppose one other way to look at it is to treat $$$\operatorname{tr}\operatorname{S}^k(\mathbf A)$$$ as the complete homogeneous symmetric polynomial $$$h_k(\lambda_1,\dots,\lambda_n)$$$, and $$$\operatorname{tr}\Lambda^k(\mathbf A)$$$ as the elementary symmetric polynomial $$$e_k(\lambda_1,\dots,\lambda_n)$$$, which then implies the correspondence between them.

Multivariate Lagrange inversion

MMT is, in fact, a linear case of a more general result, known as multivariate Lagrange implicit function theorem (LIFT).

Multivariate LIFT: Let $$$\mathbf x = (x_1,\dots,x_n)$$$ and let $$$R_1(\mathbf x), \dots, R_n(\mathbf x) \in \mathbf K[ [ \mathbf x ] ] $$$ be such that

$$$ R_j = x_j G_j(R_1,\dots,R_n), $$$

where $$$G_1(\mathbf u), \dots, G_n(\mathbf u) \in \mathbb K[ [\mathbf u] ]$$$ for $$$\mathbf u = (u_1,\dots,u_n)$$$. Then, for any $$$F(\mathbf u) \in \mathbb K[ [\mathbf u ] ]$$$ and $$$\mathbf a = (a_1,\dots,a_n)$$$ it holds that

$$$ \boxed{[\mathbf x^\mathbf a] F(R_1,\dots,R_n) = [\mathbf u^\mathbf a] F(\mathbf u) G^\mathbf a(\mathbf u) \det\left(\delta_{ij} - \frac{u_j}{G_i(\mathbf u)} \frac{\partial G_i(\mathbf u)}{\partial u_j}\right)} $$$

where the expression under the determinant is the $$$(i,j)$$$-th element of the matrix. For $$$G_i(\mathbf u) = \sum\limits_{j=1}^n a_{ij} u_j$$$ we get MMT.

I previously made a blog post about univariate LIFT, but this one here seems significantly harder to comprehend to me.

P.S. One more interesting related identity is (as found on Mathematics StackExchange):

$$$ \boxed{\sum\limits_{\mathbf k, \mathbf l \geq 0} \operatorname{per} \mathbf A^{(\mathbf k, \mathbf l)} \frac{\mathbf x^\mathbf k \mathbf y^\mathbf l}{\mathbf k! \mathbf l!} = e^{\mathbf x^\top \mathbf A \mathbf y}} $$$

Meaning that MMT essentially extracts diagonal elements from this bivariate genfunc. Nice to know, I guess?

Comments (9)

Show archived | Write comment?

123gjweq2

2 years ago, hide # |

+44

wow that is a lot of math

→ Reply

Pirate_King

+43

Blogs like these make me realize how much dumb I am :(

Abdelaleem_Ahmed

+120

......

Verydarkchocolate

21 month(s) ago, hide # |

+12

Why?

WindsOfKailash

9 months ago, hide # |

-20

Do you mean the coefficient extraction $$$[t^k]$$$ in the formal power-series sense (algebraic identity) or as an analytic coefficient requiring a region of convergence???

-is-this-fft-

9 months ago, hide # ^ |

It's always in the formal power series in this type of thing.

metroo

Great post. Could you show a concrete combinatorial bijection that matches the multiset permanent terms you sum on the left with the permutation cycle terms coming from the expansion of the inverse determinant det(I — B)? I am curious how cycles correspond to the multiset blocks and whether the same correspondence works for noncommutative or q deformed versions of MMT.

adamant

Hi there! I unfortunately do not know more direct / bijective proofs for MMT than the one described in the blog itself, but would be happy to know if there is some :)

+11

I was just about to write about this. So, there is actually a natural cycle–gluing construction that gives a combinatorial proof of MMT.

Take B = (b_{i,j}). A monomial on the LHS comes from choosing nonnegative integers c_{i,j} with row sums = k_i and column sums = k_j. This is the same as a directed balanced multigraph on {1,…,n} with c_{i,j} labeled edges from i to j.

Now at each vertex v, match its incoming edge-ends with its outgoing edge-ends in some bijection. This splices the labeled edges into disjoint directed cycles. The weight of the configuration is the product of b’s along the cycles.

Conversely, start from a multiset of directed cycles and cut each cycle at an arbitrary place to recover labeled edges, then pair at vertices. Each cycle of length r can be cut in r different ways, so there is an overcount by r.

On the RHS, det(I − B)^{-1} = exp(∑_{r≥1} trace(B^r)/r). The factor 1/r in front of trace(B^r) exactly cancels the r overcount from cutting cycles, and the exponential collects disjoint cycles into sets.

So the bijection is: terms on the LHS <-> multisets of directed cycles, with the 1/r normalization handling cycle rotations. This gives a direct combinatorial interpretation of why the two sides agree.

#	User	Rating
1	Benq	3792
2	VivaciousAubergine	3647
3	Kevin114514	3603
4	jiangly	3583
5	strapple	3515
6	tourist	3470
7	dXqwq	3436
8	Radewoosh	3415
9	Otomachi_Una	3413
10	Um_nik	3376

#	User	Contrib.
1	Qingyu	158
2	adamant	152
3	Um_nik	146
4	Dominater069	144
5	errorgorn	141
6	cry	139
7	Proof_by_QED	136
8	YuukiS	135
9	chromate00	134
9	TheScrasse	134

adamant's blog

Variable-less MMT

LHS as the sum of permanents

Sums of permanents as traces of symmetric powers

Inverse determinants as traces of symmetric powers

Applications to Quantum Physics

Multivariate Lagrange inversion