Why the construction is optimal

Welcome to my first blogpost :)

This problem https://mirror.codeforces.com/contest/2158/problem/F2 was one of the hardest problems I've tried to understand. The official editorial gives a very nice construction but, coming from an analysis/pure math background, for the sake of intuition I sought a proof of (a) a lower bound on the number of distinct values we may use, and (b) why the $$$10 \times 10$$$ construction is in fact optimal for all $$$n \le 5000$$$.

As such, this post is the same solution as the editorial written in a more "theorem" style. To summarize, I derive the extremal function $$$g(k)$$$ = maximum number of distinct adjacent gcds you can get if you use only $$$k$$$ distinct values, show that the construction exactly matches that bound, then prove the gcd-uniqueness of the $$$10 \times 10$$$ grid in a bit more detail.

First glance

At first sight, the problem feels impossible for three reasons.

1) The condition is very rigid: all $$$n-1$$$ adjacent gcds must be pairwise distinct. Gcd is a “coarse” function i.e. many pairs share the same gcd, so you may expect collisions everywhere.

2) You also have to minimize the number of distinct values in the whole sequence. If you ignore this and just try to get distinct gcds, you can throw around lots of primes and prime products and easily get a solution with $$$O(n)$$$ distinct numbers. This is, imo, the most difficult part of the problem and what makes it truly evil :')

3) The constraint $$$a_i \le 10^{18}$$$ kills the most obvious construction i.e. label every edge of a big graph with a distinct prime $$$p_e$$$, and let each vertex be the product of primes on its incident edges, so that $$$\gcd$$$ literally returns $$$p_e$$$, as one vertex might have degree around 100, and the product of that many distinct primes easily shoots past $$$10^{18}$$$.

In order to understand this problem and solve it we may (you guessed it) decompose into two subproblems.

First, ignore integers and think combinatorially; With $$$k$$$ distinct values, the pattern of which value follows which is just a trail in a graph on $$$k$$$ vertices, which leads to a nice extremal function $$$g(k)$$$ and a sharp lower bound on how many distinct values you need. We may then build a single static set $$$X$$$ of $$$k$$$ integers with a property that every unordered pair of elements has a distinct gcd. Thinking in terms of prime exponents and “gcd = coordinate-wise min” lets you design such a set using a $$$10 \times 10$$$ exponent grid.

Below, I make this more formal.

1. Graph model, lower bound

Suppose we have a valid sequence $$$a_1, a_2, \dots, a_n$$$ with all adjacent gcds $$$\gcd(a_i, a_{i+1})$$$ distinct, and suppose it uses exactly $$$k$$$ distinct values. Let

$$$ X = \{x_1, x_2, \dots, x_k\} $$$

be the set of distinct values in the sequence.

Forget the numbers, and just look at which value occurs where. Now build a graph $$$G$$$ wher (a) vertices are the values in $$$X$$$,
(b) for each $$$i$$$ in $$$[1, n-1]$$$, add an edge between the vertices corresponding to $$$a_i$$$ and $$$a_{i+1}$$$, (c) if $$$a_i = a_{i+1}$$$, this edge is a loop. Then because $$$\gcd(u, v)$$$ depends only on the unordered pair $$${u, v}$$$, we have that if the unordered pair $$${u, v}$$$ appears twice as adjacent values, we get the same gcd twice. Therefore, for each unordered pair $$${u, v}$$$ (including $$$u = v$$$), there may be at most one edge in our trail.

Equivalently, the path of values $$$(a_i)$$$ defines a trail in a graph $$$G$$$ on $$$k$$$ vertices where (a) each vertex has at most one loop, (b) each unordered pair of distinct vertices has at most one edge, (c) we walk along edges without reusing any edge.

Now let $$$m = n - 1$$$ be the number of adjacent gcds. Then $$$m$$$ is the length of a trail in such a graph on $$$k$$$ vertices. Define

$$$ g(k) = \max\{\text{length of a trail in a graph on $$$k$$$ vertices of the above type}\}. $$$

Any valid sequence using at most $$$k$$$ distinct values must satisfy

$$$ n - 1 \le g(k). $$$

This gives a very nice lower bound on the minimal number of distinct values:

$$$ D_{\min}(n) \ge \min\{k : g(k) \ge n - 1\}. $$$

We have now reduced our problem to a pure graph theory problem: compute $$$g(k)$$$.

2. $$$g(k)$$$ by Euler trails

Consider the “full” graph $$$K_k^\circ$$$ on $$$k$$$ vertices where (a) for every pair of distinct vertices, there is an edge, (b) for every vertex, there is a loop.This graph has

$$$ E_{\text{full}} = \binom{k}{2} + k = \frac{k(k+1)}{2} $$$

edges in total.

The degree of a vertex $$$v$$$ is

$$$ \deg(v) = (k - 1) + 2 = k + 1 $$$

(because there are $$$k-1$$$ edges to other vertices and one loop contributing $$$2$$$.)

Def: a connected graph has an Euler trail (a trail that uses every edge exactly once) if and only if it has either $$$0$$$ or $$$2$$$ vertices of odd degree.

We start from $$$K_k^\circ$$$ and ask: what is the maximum number of edges we can keep while still having an Euler trail?

Case 1: $$$k$$$ odd

If $$$k$$$ is odd, then $$$k + 1$$$ is even, so all vertices in $$$K_k^\circ$$$ have even degree; our graph already has an Euler tour that uses all $$$E_{\text{full}}$$$ edges, so

$$$ g(k) = E_{\text{full}} = \frac{k(k+1)}{2} \quad\text{for odd } k. Simple! $$$

Case 2: $$$k$$$ even

MIf $$$k$$$ is even, then $$$k + 1$$$ is odd, so all $$$k$$$ vertices in $$$K_k^\circ$$$ are odd-degree. Every time you delete an edge, you reduce the degree of its two endpoints by $$$1$$$, so you flip the parity of exactly those two vertices.

Start with $$$k$$$ odd vertices, and to have an Euler trail at the end, we need $$$0$$$ or $$$2$$$ odd vertices, so let'ss aim for $$$2$$$ odd vertices (that keeps more edges). Then we must flip the parity of $$$k - 2$$$ vertices. Each edge deletion flips two vertices, so we need at least $$$(k - 2)/2$$$ edge deletions.

This can be done by deleting a matching of size $$$(k - 2)/2$$$, for example edges $$$(0,1)$$$, $$$(2,3)$$$, …, $$$(k-4,k-3)$$$. The $$$k - 2$$$ endpoints of those edges become even, and the remaining 2 vertices stay odd. The graph is still connected and now has exactly 2 odd-degree vertices, so there is an Euler trail!

The number of edges remaining is

$$$ g(k) = E_{\text{full}} - \frac{k - 2}{2} = \frac{k(k+1)}{2} - \frac{k - 2}{2} = \frac{k^2}{2} + 1 \quad\text{for even }k. $$$

Combining both cases, we obtain the extremal function

$$$ g(k) = \begin{cases} \dfrac{k(k+1)}{2}, & \text{if } k \text{ is odd},\\[4pt] \dfrac{k^2}{2} + 1, & \text{if } k \text{ is even}. \end{cases} $$$

This is the maximum number of distinct adjacent gcds you can possibly achieve while using at most $$$k$$$ values, no matter how you choose the actual integers.

Therefore any valid sequence of length $$$n$$$ satisfies

$$$ n - 1 \le g(k) $$$

whenever it uses at most $$$k$$$ distinct values, and thus

$$$ D_{\min}(n) \ge \min \{k : g(k) \ge n - 1\}. $$$

For $$$n \le 5000$$$ you can check that this minimum is at most $$$100$$$, and for $$$n = 5000$$$ exactly $$$k = 100$$$ is needed. So combinatorially, “around $$$100$$$ values” is not arbitrary. Rather it is forced by the structure of trails in almost-complete graphs with loops.

3. $$$10 \times 10$$$ grid and gcd injectivity

Now we show that the $$$10 \times 10$$$ construction really gives us a set $$$X$$$ of $$$100$$$ numbers such that all gcds $$$\gcd(x_u, x_v)$$$ are pairwise distinct over unordered pairs, while still keeping $$$x_u \le 10^{18}$$$. The idea is to encode indices $$$(i, j)$$$ as exponent vectors of a few small primes, so that gcd corresponds to taking coordinate-wise minima. If we arrange the exponents carefully, the map

$$$ \{\text{two grid points}\} \mapsto \gcd(\text{their numbers}) $$$

will be injective.

First, take $$$M = 10$$$; then for each $$$0 \le i, j \lt M$$$ define

$$$ X_{i,j} = 2^{\,i+j} \cdot 3^{\,i} \cdot 5^{\,j} \cdot 7^{\,M-1-i} \cdot 11^{\,M-1-j}. $$$

These are the $$$100$$$ base numbers. It is not hard to check that they are all at most $$$10^{18}$$$ (the worst exponents are at $$$(i,j) = (9,9)$$$ and the value stays well below $$$10^{18}$$$).

now let $$$v_p(n)$$$ be the exponent of prime $$$p$$$ in $$$n$$$. Then for $$$X_{i,j}$$$ we have

$$$ \begin{aligned} v_2(X_{i,j}) &= i + j,\\ v_3(X_{i,j}) &= i,\\ v_5(X_{i,j}) &= j,\\ v_7(X_{i,j}) &= M - 1 - i,\\ v_{11}(X_{i,j}) &= M - 1 - j. \end{aligned} $$$

Take two points $$$(i_1, j_1)$$$ and $$$(i_2, j_2)$$$ and let

$$$ g = \gcd(X_{i_1,j_1}, X_{i_2,j_2}). $$$

Then

$$$ \begin{aligned} v_3(g) &= \min(i_1, i_2),\\ v_7(g) &= \min(M - 1 - i_1, M - 1 - i_2) = M - 1 - \max(i_1, i_2). \end{aligned} $$$

This means that from $$$(v_3(g), v_7(g))$$$ alone we can recover the unordered pair of $$$i$$$-indices:

$$$ \min(i_1, i_2) = v_3(g), \quad \max(i_1, i_2) = M - 1 - v_7(g). $$$

Similarly,

$$$ \begin{aligned} v_5(g) &= \min(j_1, j_2),\\ v_{11}(g) &= M - 1 - \max(j_1, j_2) \end{aligned} $$$

so from $$$(v_5(g), v_{11}(g))$$$ we recover the unordered pair $$${j_1, j_2}$$$.

So the gcd already tells us $$${i_1, i_2}$$$ as a multiset and $$${j_1, j_2}$$$ as a multiset. The only ambiguity is how to pair them.. do we have

$$$ \{(i_1,j_1),(i_2,j_2)\} = \{(i_{\min},j_{\min}),(i_{\max},j_{\max})\} $$$

or the cross pairing

$$$ \{(i_1,j_1),(i_2,j_2)\} = \{(i_{\min},j_{\max}),(i_{\max},j_{\min})\}, $$$

where $$$i_{\min} \le i_{\max}$$$ and $$$j_{\min} \le j_{\max}$$$.

This is where the exponent of $$$2$$$ comes in. We have

$$$ v_2(X_{i,j}) = i + j, \quad v_2(g) = \min(i_1 + j_1, i_2 + j_2). $$$

Under the “parallel” pairing $$$(i_{\min},j_{\min}), (i_{\max},j_{\max})$$$, the smaller sum is $$$i_{\min} + j_{\min}$$$. Under the “cross” pairing, the smaller sum is $$$\min(i_{\min} + j_{\max}, i_{\max} + j_{\min})$$$, which is strictly larger as long as both inequalities $$$i_{\min} \lt i_{\max}$$$ and $$$j_{\min} \lt j_{\max}$$$ hold. For $$$0 \le i,j \lt 10$$$, this distinguishes the two cases. (In the degenerate cases where $$$i_1 = i_2$$$ or $$$j_1 = j_2$$$, there is no difference between the pairings anyway :) )

So the full vector of gcd exponents

$$$ \bigl(v_2(g), v_3(g), v_5(g), v_7(g), v_{11}(g)\bigr) $$$

uniquely determines the unordered pair $$${(i_1,j_1),(i_2,j_2)}$$$. This is to say, for all unordered pairs $$${u, v}$$$ of numbers in the $$$10 \times 10$$$ grid (including $$$u = v$$$), the gcds $$$\gcd(u,v)$$$ are all distinct.

This is exactly what we need!

4. optimal for all $$$n \le 5000$$$

Now we combine the graph-theoretic lower bound with the number construction. Fix $$$n \le 5000$$$. Let $$$k$$$ be the smallest integer such that $$$g(k) \ge n - 1$$$. From the formula for $$$g(k)$$$, one can check that $$$k \le 100$$$ for all such $$$n$$$.

Restrict the 100-grid set $$$X$$$ to its first $$$k$$$ values (however you order them). On these $$$k$$$ vertices, build the graph used in the definition of $$$g(k)$$$:

if $$$k$$$ is odd: take the full graph with loops, $$$K_k^\circ$$$;
if $$$k$$$ is even: take $$$K_k^\circ$$$ and delete a matching of size $$$(k-2)/2$$$.

By construction, this graph has an Euler trail of length $$$g(k)$$$ and is connected. Because every unordered pair of values in $$$X$$$ has a distinct gcd, every edge in this graph has a distinct gcd label. Walking along an Euler trail and mapping vertices back to their numbers in $$$X$$$ gives a sequence whose adjacent gcds are all distinct.

Since the trail has length $$$g(k) \ge n - 1$$$, we can take only the first $$$n$$$ vertices of this trail and obtain a valid sequence of length $$$n$$$ using at most $$$k$$$ distinct values. By the lower bound from the $$$g(k)$$$ analysis, no sequence of length $$$n$$$ can use fewer than $$$k$$$ distinct values.

Therefore the $$$10 \times 10$$$ construction is not just a correct solution: it is optimal with respect to the number of distinct values for every $$$n \le 5000$$$.

Thanks for reading, I would love to hear your thoughts :)