Tricky lemma - Codeforces

→ Pay attention

Before contest
Codeforces Round 995 (Div. 3)
08:04:54
Register now »

→ Streams

Codeforces Round 995 Solution Discussion

By aryanc403

Before stream 10:24:54

View all →

→ Top rated

#	User	Rating
1	tourist	3985
2	jiangly	3814
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3517
7	Radewoosh	3410
8	hos.lyric	3399
9	ecnerwala	3392
9	Um_nik	3392

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	cry	169
2	maomao90	162
2	Um_nik	162
4	atcoder_official	161
5	djm03178	158
6	-is-this-fft-	157
7	adamant	155
8	awoo	154
8	Dominater069	154
10	luogu_official	150

View all →

→ Find user

→ Recent actions

Detailed →

Fly_37's blog

Tricky lemma

By Fly_37, history, 3 years ago, In English

Hello!

Some time ago I created a problem for local programming competition. Unfortunately it turned out that I had incomplete proof of one lemma, that I can not show even to this day.

Lemma: Given an increasing array of $$$N$$$ arbitrary large numbers we define its cost as sum of lengths of all non-trivial, maximal arithmetic progressions starting at the first element. The cost of any array is $$$\mathcal{O}(N\log{N})$$$.

For example for array $$$[0, 2, 3, 4, 6, 8, 9]$$$ — the total cost is $$$|[0, 2, 4, 6, 8]| + |[0, 3, 6, 9]| + |[0, 4, 8]| + |[0, 6]| + |[0, 8]| + |[0, 9]| = 5 + 4 + 3 + 2 + 2 + 2= 18$$$.

It is easy to see, that if we simply take $$$N$$$ consecutive natural numbers we get $$$\mathcal{O}(N\log{N})$$$ cost, but I was not able to prove that this is the worst case scenario.

Best complexity I can show is $$$\mathcal{o}(N^2)$$$, but still far from the goal...

Can anyone show if the lemma is true or false?

proof, arithmetic sequence

+258

Fly_37
3 years ago
23

Comments (22)

Show archived | Write comment?

pavlovoleg4889

3 years ago, # |

+40

Shouldn't the answer for the example be $$$5 + 4 + 3 + 2 + 2 + 2 = 18$$$?

→ Reply

Fly_37

3 years ago, # ^ |

Sure, fixed. Thanks!

→ Reply

pavlovoleg4889

3 years ago, # ^ |

+47

The second part is false then.

$$$F([0, 1, 2, 3, 4, 5]) = 15$$$ as

$$$|[0, 1, 2, 3, 4, 5]| + |[0, 2, 4]| + |[0, 3]| + |[0, 4]| + |[0, 5]| = 6 + 3 + 2 + 2 + 2 = 15$$$.

On the other hand, $$$F([0, 1, 2, 3, 4, 6]) = 16$$$ as

$$$|[0, 1, 2, 3, 4]| + |[0, 2, 4, 6]| + |[0, 3, 6]| + |[0, 4]| + |[0, 6]| = 5 + 4 + 3 + 2 + 2 = 16$$$.

→ Reply

Fly_37

3 years ago, # ^ |

+18

That is a fair point :o

→ Reply

algoishard

3 years ago, # |

← Rev. 2 →

Is the array strictly increasing, or just non-decreasing?

Edit: Actually, I guess we may assume that the array is just strictly increasing (having duplicate values cannot increase the number of arithmetic progressions).

→ Reply

Fly_37

3 years ago, # ^ |

Array is strictly increasing

→ Reply

Fly_37

3 years ago, # |

Auto comment: topic has been updated by Fly_37 (previous revision, new revision, compare).

→ Reply

Wind_Eagle

3 years ago, # |

+55

Spent about an hour and couldn't solve it. Brilliant, though very hard problem :)

→ Reply

TyroWhizz

3 years ago, # |

← Rev. 2 →

-44

We can try something with differences

di = di+1 — di

d1 d2 d3 -- dn-1 ,

now our problems become how many different ways are there to divide these differences in different groups of equal sum such that we can't take out this sum from last group of remaining elements anymore

first group : d0, d1 + d2 + -- + dl, dl+1 — dr, [dr -- indivisible group]

similarly : d0 + d1, ------- , [dr -- indivisible group

I got until here, main observations were : that no. of groups of each length will reduce significantly ,

total no. of diff arrays will be n

→ Reply

jacynkaa

3 years ago, # |

← Rev. 2 →

+165

$$$O(N \log N)$$$ seems hard, but here is a proof for $$$O(N^{3/2})$$$ bound:

It is easy to see that $$$a_i,a_j$$$ can be consecutive elements of at most one maximal arithmetic progression.

Let $$$S_i$$$ — set of such $$$j$$$'s that $$$a_i<a_j$$$ are consecutive in one of the sequences. The answer for the problem is $$$\sum_{i=1}^n |S_i|+n$$$

[1] If the $$$[0=a_{i_0},a_{i_1},a_{i_2} , \ldots a_{i_k}]$$$ is a maximal sequence, then $$$\sum_{j=0}^{k-1} i_{j+1} -i_{j} = i_k -i_0 \leq n$$$.

Let $$$S_i'= [ j-i : j \in S_i ]$$$. From [1] we have that $$$\sum_{i=1}^n \sum_{k \in S_i'} k \leq n^2$$$. On the other hand $$$ \sum_{k \in S_i'} k \geq \frac{|S_i'|^2}{2}$$$, because $$$S_i'$$$ is a set of distinct natural numbers. Combining those two inequalities we have: $$$\sum_{i=1}^n |S_i'|^2 \leq 2n^2 $$$. Using AM-QM inequality we obtain that $$$\sum_{i=1}^n |S_i| \in O(N^{3/2})$$$.

→ Reply

yanire

3 years ago, # |

← Rev. 2 →

Consider the following algorithm that counts the number of arithmetic progressions:

for every prime $$$p$$$ (excluding 1 of course):
Consider all elements $$$a_0+t \cdot p$$$. Denote their amount by $$$\ell_p$$$. Sequences with difference $$$p,2p,3p,...,\ell \cdot p $$$ are valid options.
Iterate over $$$i=1$$$ to $$$\ell$$$ and add $$$\lfloor\frac{\ell}{i}\rfloor$$$.

Since we iterated over all multiples of primes, we iterated over all differences possible (we excluded 1, so the bound will have an extra $$$n$$$ term). We also know that $$$ \sum_{p}{\ell_p} = n $$$. It remains to bound the sum of $$$\lfloor\frac{\ell}{i}\rfloor$$$:

$$$ \sum_{i=1}^{\ell}{\lfloor\frac{\ell}{i}\rfloor} \le \sum_{i=1}^{\ell}{\ell\cdot\frac{1}{i}} \le \ell \cdot \log{\ell} $$$

Summing over all primes yield $$$\sum_{\ell_p}{\ell_p \cdot \log{\ell_p} \le n \cdot \log{n} } $$$.

Edit: As w0nsh said, a single number can contribute to multiple $$$\ell_p$$$'s. We know that $$$x$$$ can have at most $$$\log{x}$$$ different prime divisors. Suppose $$$a_i\le M $$$, then $$$ \sum_{p}{\ell_p} \le n \cdot \log{M} $$$, and the bound will be $$$ n \cdot \log{M} \cdot \log{(n \cdot \log{M})} $$$. I assumed that for a specific prime number, all differences are possible. Perhaps proving an $$$O(n \cdot \log{n}) $$$ bound requires analyzing this more carefully.

→ Reply

w0nsh

3 years ago, # ^ |

+51

I don't think $$$\sum_{p}{\ell_p} = n$$$ is true. One number can contribute to multiple $$$\ell_p$$$ 's.

→ Reply

dacin21

3 years ago, # ^ |

+14

I don't see why $$$p, 2p, 3p, \dots, \ell \cdot p$$$ should be the only valid differences for a given prime. Are you implicitly assuming that $$$1, 2, \dots, n$$$ (or $$$p, 2p, \dots, n\cdot p$$$) is the sequence with highest cost? (I think $$$\lfloor \frac{\ell}{i}\rfloor$$$ should be $$$\lfloor \frac{M}{i} \rfloor$$$, but then considering primes doesn't help.)

Here's a simple $$$O(M \log n)$$$ bound: WLOG suppose the first element is $$$0$$$. For every non-zero element $$$x$$$, there is a unique maximal arithmetic progression starting with $$$0, x, \dots$$$. Its length is at most $$$\frac{M}{x} + 1$$$. Summing over all $$$x$$$ gives

$$$ \sum_{0 < x \in A} \Big(\frac{M}{i}+1\Big) \leq n-1 + \sum_{i=1}^{n-1} \frac{M}{i} = O(n + M \log n) = O(M \log n) $$$

→ Reply

ComboGirl

3 years ago, # |

+10

Should be noted that consecutive numbers isn't the worst one, but it probably is asymptotically. For example, $$$[0,1,2,3,4,5,6,7]$$$ has lower cost then $$$[0,1,2,3,4,5,6,8].$$$

→ Reply

jacynkaa

3 years ago, # |

← Rev. 2 →

+146

I found a proof for slightly worse bound: $$$O(n \log^2 n)$$$.

Let $$$p_i,p_{i+1}$$$ be consecutive primes. Let $$$E_{p_i}$$$ be a set of pairs $$$(a_l,a_j)$$$ such that there exists maximal arithmetic progression, such that $$$a_l$$$ is the $$$p_i$$$-th element, and $$$a_j$$$ is the $$$p_{i+1}$$$-th element. WLOG we can assume that $$$a_0=0$$$, therefore $$$\frac{a_l}{a_j}=\frac{p_i}{p_{i+1}}$$$

Notice that from the Prime number theorem it follows that the cost in original problem is $$$O(\sum_p |E_p| \log n)$$$.

Lets consider initially empty graph $$$G$$$ with $$$V=[a_0,a_1,\ldots a_{n-1}]$$$. We will be adding edges from $$$E_2, E_3, \ldots$$$ to the graph, in that order.

[1] every vertex appears in $$$E_p$$$ at most twice;

[2] If $$$a_l, a_k$$$ are in the same connected component of $$$(V, E_2 \cup \ldots \cup E_{p_{i}})$$$ then $$$\frac{a_l}{a_k}=\frac{A}{B}$$$ for some coprime numbers $$$A,B$$$ with prime factors $$$\leq p_{i+1}$$$. Therefore for every $$$j>i: \ (a_k,a_l) \not\in E_{p_j}$$$.

From [1] it follows that if we are adding edges from $$$E_q$$$ and connecting two components $$$S_1,S_2$$$, then $$$|E_q \cap S_1 \times S_2| \leq 2\cdot min(|S_1|,|S_2|)$$$. Combining smaller-to-bigger argument and [2], we obtain, that the total number of edges in graph $$$(V, E_2 \cup E_3 \cup \ldots)$$$ is $$$O(n \log n)$$$, so the bound for the problem is $$$O(n \log^2 n)$$$.

→ Reply

Fly_37

3 years ago, # ^ |

+63

This seems to be a huge improvement! Wow :o

→ Reply

lin7xu

3 years ago, # ^ |

+30

amazing proof

→ Reply

barbarian_clash_of_clans

3 years ago, # |

-38

Hey, if the array is increasing, then it's very easy to prove following:

Cost can be calculated in O(Nlog²N) in worst scenario. (Add: maybe one log factor can be reduced, depends on restrictions on A[i])

Let's assume our array is

a0, a1, a2, a3, ... , aN.

We construct another array d

d0, d1, ... , dN

Such that d[i] = a[i]-a[0]

Now let's see. If array is increasing, we can iterate for each d[i] (ignoring d[0]) and do

int k=1; while (map[d*k]) { //while d*k exists in array k++ } In the end, k will be the length of such subarray with d==d[i]

//Map adds another log factor, maybe can be reduced

1)Note, that k<=n 2)Because array is strictly increasing, d1<d2<d3<...<dN 3)Means total complexity will be in worst scenario O(n/d1 + n/d2 + ... + n/dN)

Now let's prove that this is <=O(nlogn)

First of all, note that n/d1 + n/d2 + n/d3 + ... + n/dN <= n/1 + n/2 + n/3 + ... + n/N (you can verify it by n/d1 <= n/1, n/d2 <= n/2, etc.)

So worst case complexity is O(n/1 + n/2 + n/3 + ... + n/n). For me it's already well known lemma that this is <=NlogN, but if somehow you didn't know this:

n/1+n/2+...+n/n = n( 1+1/2+1/3+...+1/n)

1/2+1/3 < 1/2+1/2 = 1 1/4+1/5+1/6+1/7 < 1/4+1/4+1/4+1/4 = 1 ... 1/2^k + 1/(2^k+1) + ... + 1/(2^(k+1)-1) < 2^k * 1/2^k = 1

So 1+1/2+1/3+...+1/n < 1+1+1+...+1+1 <= log n

So O(n*(1+1/2+1/3+...+1/n)) < O(nlogn)

→ Reply

-is-this-fft-

3 years ago, # ^ |

+14

what

→ Reply

Noam527

3 years ago, # |

+55

Here's a construction that does $$$\omega(n \log n)$$$ for infinitely many $$$n$$$, disproving the lemma:

Choose some $$$k$$$ and consider the set of elements $$$i\cdot j$$$ for $$$1 \leq i,j \leq k$$$ — sorted, after removing duplicates and with a $$$0$$$ in the beginning.

This sequence has cost $$$\Omega(k^2 \log k)$$$: any starting element $$$ij$$$ has a sequence of length at least $$$\frac{k}{i}$$$ which is enough to show this bound.

However, after removing duplicates we have $$$o(k^2)$$$ elements in our sequence — as found here: https://mathoverflow.net/questions/108912/number-of-elements-in-the-set-1-cdots-n-cdot-1-cdots-n/108939#108939

Choosing $$$k = \sqrt{n}$$$ gives the contradiction. We still don't know the exact bound though.

→ Reply

Endagorion

3 years ago, # ^ |

+81

Won't removing the duplicates affect the total cost estimate as well, since $$$ij$$$ are not counted multiple times anymore?

→ Reply

Noam527

3 years ago, # ^ |

← Rev. 2 →

+20

Right, it might make it asymptotically less than $$$k^2 \log k$$$, my bad. Not sure if this invalidates the construction but I feel like it does.

→ Reply