Hello!

Some time ago I created a problem for local programming competition. Unfortunately it turned out that I had incomplete proof of one lemma, that I can not show even to this day.

**Lemma:** Given an increasing array of $$$N$$$ arbitrary large numbers we define its cost as sum of lengths of all non-trivial, maximal arithmetic progressions starting at the first element. The cost of any array is $$$\mathcal{O}(N\log{N})$$$.

For example for array $$$[0, 2, 3, 4, 6, 8, 9]$$$ — the total cost is $$$|[0, 2, 4, 6, 8]| + |[0, 3, 6, 9]| + |[0, 4, 8]| + |[0, 6]| + |[0, 8]| + |[0, 9]| = 5 + 4 + 3 + 2 + 2 + 2= 18$$$.

It is easy to see, that if we simply take $$$N$$$ consecutive natural numbers we get $$$\mathcal{O}(N\log{N})$$$ cost, but I was not able to prove that this is the worst case scenario.

Best complexity I can show is $$$\mathcal{o}(N^2)$$$, but still far from the goal...

Can anyone show if the lemma is true or false?

Shouldn't the answer for the example be $$$5 + 4 + 3 + 2 + 2 + 2 = 18$$$?

Sure, fixed. Thanks!

The second part is false then.

$$$F([0, 1, 2, 3, 4, 5]) = 15$$$ as

$$$|[0, 1, 2, 3, 4, 5]| + |[0, 2, 4]| + |[0, 3]| + |[0, 4]| + |[0, 5]| = 6 + 3 + 2 + 2 + 2 = 15$$$.

On the other hand, $$$F([0, 1, 2, 3, 4, 6]) = 16$$$ as

$$$|[0, 1, 2, 3, 4]| + |[0, 2, 4, 6]| + |[0, 3, 6]| + |[0, 4]| + |[0, 6]| = 5 + 4 + 3 + 2 + 2 = 16$$$.

That is a fair point :o

Is the array strictly increasing, or just non-decreasing?

Edit: Actually, I guess we may assume that the array is just strictly increasing (having duplicate values cannot increase the number of arithmetic progressions).

Array is strictly increasing

Auto comment: topic has been updated by Fly_37 (previous revision, new revision, compare).Spent about an hour and couldn't solve it. Brilliant, though very hard problem :)

Here's what I think will really happened with you.

We can try something with differences

di = di+1 — di

d1 d2 d3 -- dn-1 ,

now our problems become how many different ways are there to divide these differences in different groups of equal sum such that we can't take out this sum from last group of remaining elements anymore

first group : d0, d1 + d2 + -- + dl, dl+1 — dr, [dr -- indivisible group]

similarly : d0 + d1, ------- , [dr -- indivisible group

I got until here, main observations were : that no. of groups of each length will reduce significantly ,

total no. of diff arrays will be n

$$$O(N \log N)$$$ seems hard, but here is a proof for $$$O(N^{3/2})$$$ bound:

It is easy to see that $$$a_i,a_j$$$ can be consecutive elements of at most one maximal arithmetic progression.

Let $$$S_i$$$ — set of such $$$j$$$'s that $$$a_i<a_j$$$ are consecutive in one of the sequences. The answer for the problem is $$$\sum_{i=1}^n |S_i|+n$$$

[1] If the $$$[0=a_{i_0},a_{i_1},a_{i_2} , \ldots a_{i_k}]$$$ is a maximal sequence, then $$$\sum_{j=0}^{k-1} i_{j+1} -i_{j} = i_k -i_0 \leq n$$$.

Let $$$S_i'= [ j-i : j \in S_i ]$$$. From [1] we have that $$$\sum_{i=1}^n \sum_{k \in S_i'} k \leq n^2$$$. On the other hand $$$ \sum_{k \in S_i'} k \geq \frac{|S_i'|^2}{2}$$$, because $$$S_i'$$$ is a set of distinct natural numbers. Combining those two inequalities we have: $$$\sum_{i=1}^n |S_i'|^2 \leq 2n^2 $$$. Using AM-QM inequality we obtain that $$$\sum_{i=1}^n |S_i| \in O(N^{3/2})$$$.

Consider the following algorithm that counts the number of arithmetic progressions:

Since we iterated over all multiples of primes, we iterated over all differences possible (we excluded 1, so the bound will have an extra $$$n$$$ term). We also know that $$$ \sum_{p}{\ell_p} = n $$$. It remains to bound the sum of $$$\lfloor\frac{\ell}{i}\rfloor$$$:

Summing over all primes yield $$$\sum_{\ell_p}{\ell_p \cdot \log{\ell_p} \le n \cdot \log{n} } $$$.

Edit: As w0nsh said, a single number can contribute to multiple $$$\ell_p$$$'s. We know that $$$x$$$ can have at most $$$\log{x}$$$ different prime divisors. Suppose $$$a_i\le M $$$, then $$$ \sum_{p}{\ell_p} \le n \cdot \log{M} $$$, and the bound will be $$$ n \cdot \log{M} \cdot \log{(n \cdot \log{M})} $$$. I assumed that for a specific prime number, all differences are possible. Perhaps proving an $$$O(n \cdot \log{n}) $$$ bound requires analyzing this more carefully.

I don't think $$$\sum_{p}{\ell_p} = n$$$ is true. One number can contribute to multiple $$$\ell_p$$$ 's.

I don't see why $$$p, 2p, 3p, \dots, \ell \cdot p$$$ should be the only valid differences for a given prime. Are you implicitly assuming that $$$1, 2, \dots, n$$$ (or $$$p, 2p, \dots, n\cdot p$$$) is the sequence with highest cost? (I think $$$\lfloor \frac{\ell}{i}\rfloor$$$ should be $$$\lfloor \frac{M}{i} \rfloor$$$, but then considering primes doesn't help.)

Here's a simple $$$O(M \log n)$$$ bound: WLOG suppose the first element is $$$0$$$. For every non-zero element $$$x$$$, there is a unique maximal arithmetic progression starting with $$$0, x, \dots$$$. Its length is at most $$$\frac{M}{x} + 1$$$. Summing over all $$$x$$$ gives

Should be noted that consecutive numbers isn't the worst one, but it probably is asymptotically. For example, $$$[0,1,2,3,4,5,6,7]$$$ has lower cost then $$$[0,1,2,3,4,5,6,8].$$$

I found a proof for slightly worse bound: $$$O(n \log^2 n)$$$.

Let $$$p_i,p_{i+1}$$$ be consecutive primes. Let $$$E_{p_i}$$$ be a set of pairs $$$(a_l,a_j)$$$ such that there exists maximal arithmetic progression, such that $$$a_l$$$ is the $$$p_i$$$-th element, and $$$a_j$$$ is the $$$p_{i+1}$$$-th element. WLOG we can assume that $$$a_0=0$$$, therefore $$$\frac{a_l}{a_j}=\frac{p_i}{p_{i+1}}$$$

Notice that from the Prime number theorem it follows that the cost in original problem is $$$O(\sum_p |E_p| \log n)$$$.

Lets consider initially empty graph $$$G$$$ with $$$V=[a_0,a_1,\ldots a_{n-1}]$$$. We will be adding edges from $$$E_2, E_3, \ldots$$$ to the graph, in that order.

[1] every vertex appears in $$$E_p$$$ at most twice;

[2] If $$$a_l, a_k$$$ are in the same connected component of $$$(V, E_2 \cup \ldots \cup E_{p_{i}})$$$ then $$$\frac{a_l}{a_k}=\frac{A}{B}$$$ for some coprime numbers $$$A,B$$$ with prime factors $$$\leq p_{i+1}$$$. Therefore for every $$$j>i: \ (a_k,a_l) \not\in E_{p_j}$$$.

From [1] it follows that if we are adding edges from $$$E_q$$$ and connecting two components $$$S_1,S_2$$$, then $$$|E_q \cap S_1 \times S_2| \leq 2\cdot min(|S_1|,|S_2|)$$$. Combining smaller-to-bigger argument and [2], we obtain, that the total number of edges in graph $$$(V, E_2 \cup E_3 \cup \ldots)$$$ is $$$O(n \log n)$$$, so the bound for the problem is $$$O(n \log^2 n)$$$.

This seems to be a huge improvement! Wow :o

amazing proof

Hey, if the array is increasing, then it's very easy to prove following:

Cost can be calculated in O(Nlog²N) in

worst scenario. (Add: maybe one log factor can be reduced, depends on restrictions on A[i])Let's assume our array is

a0, a1, a2, a3, ... , aN.

We construct another array d

d0, d1, ... , dN

Such that d[i] = a[i]-a[0]

Now let's see. If array is increasing, we can iterate for each d[i] (ignoring d[0]) and do

int k=1; while (map[d*k]) { //while d*k exists in array k++ } In the end, k will be the length of such subarray with d==d[i]

//Map adds another log factor, maybe can be reduced

1)Note, that k<=n 2)Because array is strictly increasing, d1<d2<d3<...<dN 3)Means total complexity will be in worst scenario O(n/d1 + n/d2 + ... + n/dN)

Now let's prove that this is <=O(nlogn)

First of all, note that n/d1 + n/d2 + n/d3 + ... + n/dN <= n/1 + n/2 + n/3 + ... + n/N (you can verify it by n/d1 <= n/1, n/d2 <= n/2, etc.)

So worst case complexity is O(n/1 + n/2 + n/3 + ... + n/n). For me it's already well known lemma that this is <=NlogN, but if somehow you didn't know this:

n/1+n/2+...+n/n = n( 1+1/2+1/3+...+1/n)

1/2+1/3 < 1/2+1/2 = 1 1/4+1/5+1/6+1/7 < 1/4+1/4+1/4+1/4 = 1 ... 1/2^k + 1/(2^k+1) + ... + 1/(2^(k+1)-1) < 2^k * 1/2^k = 1

So 1+1/2+1/3+...+1/n < 1+1+1+...+1+1 <= log n

So O(n*(1+1/2+1/3+...+1/n)) < O(nlogn)

what

Here's a construction that does $$$\omega(n \log n)$$$ for infinitely many $$$n$$$, disproving the lemma:

Choose some $$$k$$$ and consider the set of elements $$$i\cdot j$$$ for $$$1 \leq i,j \leq k$$$ — sorted, after removing duplicates and with a $$$0$$$ in the beginning.

This sequence has cost $$$\Omega(k^2 \log k)$$$: any starting element $$$ij$$$ has a sequence of length at least $$$\frac{k}{i}$$$ which is enough to show this bound.

However, after removing duplicates we have $$$o(k^2)$$$ elements in our sequence — as found here: https://mathoverflow.net/questions/108912/number-of-elements-in-the-set-1-cdots-n-cdot-1-cdots-n/108939#108939

Choosing $$$k = \sqrt{n}$$$ gives the contradiction. We still don't know the exact bound though.

Won't removing the duplicates affect the total cost estimate as well, since $$$ij$$$ are not counted multiple times anymore?

Right, it might make it asymptotically less than $$$k^2 \log k$$$, my bad. Not sure if this invalidates the construction but I feel like it does.