Ordered Set Intersection

#	User	Rating
1	Benq	3792
2	VivaciousAubergine	3647
3	Kevin114514	3611
4	jiangly	3583
5	strapple	3515
6	tourist	3470
7	Radewoosh	3415
8	Um_nik	3376
9	maroonrk	3361
10	XVIII	3345

#	User	Contrib.
1	Qingyu	162
2	adamant	148
3	Um_nik	146
4	Dominater069	143
5	errorgorn	141
6	cry	138
7	Proof_by_QED	136
8	YuukiS	135
9	chromate00	134
10	soullless	133

I'm a simple guy. When I find out I can reduce a problem to something known (or something I suspect must be known) and I've never learned it, then I just look in the internet to see if my approach may be feasible. When I was solving 2155F - Juan's Colorful Tree, which I'll refer on the blog as JCT, and found out it could be reduced to finding the size of the intersection of ordered sets, I did just that. What I found was

Binary Search

Given two ordered sets $$$A$$$ and $$$B$$$, which Im assuming to be of type vector<int>, then this one works in time $$$O(|A| log |B| )$$$ where $$$|A|\leq |B|$$$. The code is something like

int res=0;
for(int ac:A)res+=binary_search(B.begin(),B.end(),ac);

When I sent it to JCT. It got AC with $$$\approx 1.8$$$ seconds and a total complexity of something like $$$O(n\sqrt{n}lgn)$$$. However, now it seems to be patched out (it got in $$$\approx 1.8$$$ seconds when testing, now it gets TLE). But I decided I could do better...

Weird? Binary Search

I noticed that always doing binary search on the range $$$ [0,|B|) $$$ should be a bit costly, as on every iteration I'm binary searching from 0. I noticed that the maximum index $$$j$$$ that makes $$$ A[i]\geq B[j] $$$ is monotonically increasing on $$$i$$$. This means that if the last index I found was $$$j$$$, then the new index will be in $$$ j'\in[j,|B|) $$$. This should reduce the running time a bit, so I implemented something like

int ind=0,jmp=1,res=0;
for(int ac:A){
	while( jmp+ind<int(B.size()) && B[jmp+ind]<=ac ) {
		iv+=jmp;
		jmp*=2;
	}
	while( jmp>1 ) {
		jmp/=2;
		if( jmp+ind<int(B.size())&& B[jmp+ind]<=ac)iv+=jmp;
	}
	res+=B[ind]==ac;
}

I assumed the complexity of this fragment was also $$$O(|A| log |B|)$$$. Then I sent it to JCT. It got AC with $$$\approx 0.8$$$ seconds and assumed the total complexity on my submission was something like $$$O(n\sqrt{n}lgn)$$$. Then I went to sleep

Real Complexity

My best solution actually went through with more or less the same time as the model solution, so I was asked (jampm) if I was sure the real complexity on JCT was $$$O(n\sqrt{n}lgn)$$$. After a while thinking, I realized there where some cases when the second algorithm worked in time $$$O(|A|)$$$, which is when $$$|A|=|B|$$$ as it can become the two-pointer idea. This more or less countered the worst case I thought that gave the $$$O(n\sqrt{n}lgn)$$$ complexity, reducing it to $$$O(n\sqrt{n})$$$ (at least in the construction I thought).

I kept thinking on the real complexity of the second algorithm. I conjectured it was something like $$$O(|A|(1+log|B|-log|A|))=O(|A|(1+log(|B|/|A|)))$$$, where the +1 is just for it to be 'well-conditioned' when $$$|A|=|B|$$$. It turns out that's true

proof

I'll think on the worst case complexity as an optimization problem. It will be modeled by $$$h(|A|,|B|)$$$, where $$$h$$$ is defined as

$$$h(1,m)=max_{0\leq y\leq m}\{log(y)\}$$$ $$$h(n,m)=max_{0\leq y\leq m}\{log(y)+h(n-1,m-y)\}$$$

We will assume that $$$n,m \gt 0$$$.

I claim that the $$$y$$$ that optimizes $$$h$$$ is $$$y=m/n$$$. This will, in turn, mean that $$$h(n,m)=n\cdot log(m/n)$$$.

Lemma 1

For $$$f(n,m)$$$ defined as $$$f(1,m)=log(m)$$$ and $$$f(n,m)=log(m/n)+f(n-1,m- m/n)$$$, then $$$f(n,m)=n\cdot log(m/n)$$$

proof by induction on n

When $$$n=1$$$ then it follows trivially. Assume it holds for a fixed $$$n$$$, then we want to show that $$$f(n+1,m)=(n+1)\cdot log(m/(n+1))$$$. Observe that

$$$f(n+1,m)=log(m/(n+1))+f(n,m- m/(n+1))$$$ $$$f(n+1,m)=log(m/(n+1))+n\cdot log\left(\frac{m-m/(n+1)}{n}\right)$$$ $$$f(n+1,m)=log(m/(n+1))+n\cdot log\left( \frac{m(n+1)-m}{n(n+1)} \right)$$$ $$$f(n+1,m)=log(m/(n+1))+n\cdot log(m/(n+1))$$$ $$$f(n+1,m)=(n+1)\cdot log(m/(n+1))$$$

$$$\blacksquare$$$

Corollary 1.1

If $$$y=m/n$$$ is optimal in $$$h$$$ then $$$h(n,m)=f(n,m)=n\cdot log(m/n)$$$.

Lemma 2

For $$$h(n,m)$$$ as defined above, then $$$h(n,m)=log(y)+h(n-1,m-y)$$$ for $$$y=n/m$$$.

proof by strong induction on n

When $$$n=1$$$ (base case), recall

$$$h(1,m)= max_{0\leq y\leq m}\{log(y)\}$$$

This will be maximized at $$$y=m=m/n$$$ as $$$log(\cdot)$$$ is an strictly increasing function.

Lets assume that $$$y=m/n$$$ is optimal for all $$$1..n$$$. Now I want to show that it's also optimal for $$$h(n+1,m)$$$. Recall that by Corollary 1.1 we have that $$$h(n,m)=n\cdot log(m/n)$$$.

Let $$$g(y,n,m)=log(y)+h(n,m-y)$$$, then

$$$g=log(y)+n\cdot log\left(\frac{m-y}{n}\right)$$$

As the base in $$$log$$$ is not important in asymptotics, we'll assume its $$$e$$$ for convenience

$$$\frac{\partial}{\partial y}g=\frac{1}{y}+n\frac{1}{(m-y)/n}\cdot\frac{-1}{n}$$$ $$$\frac{\partial}{\partial y}g=\frac{1}{y}-\frac{n}{m-y}$$$

Now we'll explore the sign of $$$\frac{\partial}{\partial y}g$$$

$$$0 \gt \frac{1}{y}-\frac{n}{m-y}$$$ $$$\frac{n}{m-y} \gt \frac{1}{y}$$$ $$$n\cdot y \gt m-y$$$ $$$y \gt \frac{m}{n+1}$$$

This means that the sign of $$$\frac{\partial}{\partial y}g$$$ is negative when $$$y \gt \frac{m}{n+1}$$$ and positive when $$$y \lt \frac{m}{n+1}$$$ (by a symetric idea). This means $$$g$$$ achieves a local optimum (assuming $$$n$$$,$$$m$$$ fixed) in $$$y=\frac{m}{n+1}$$$. As all the functions used in $$$h$$$ ($$$/$$$, $$$log(\cdot)$$$,$$$+$$$) are continuous for $$$n,m \gt 0$$$ and there aren't any other local optimums for $$$n,m \gt 0$$$, then we conclude $$$y=m/n$$$ is optimal in $$$h$$$. $$$\blacksquare$$$

Corollary 2.1

$$$h(n,m)=n\cdot log(m/n)$$$

Concluding

This idea is not actually new. After looking further in internet, it appears this problem has been studied in research (this same idea and many others have been discovered); however, I still find it interesting enough to write about it on this format because I found the second complexity a bit unexpected, the algorithm is somewhat simple and this version seems not so easy to find.

My submission to JCT is 342818172. I didn't actually found an upper-bound to the complexity on my submission; I assume it's $$$O(n\sqrt{n})$$$ and found a case that forces it to go to $$$O(n\sqrt{n})$$$, but it's not proven.

Testing was a fun experience. I found solving JCT interesting. I now know a bit more about the submission I cooked.

osvarp out

Comments (1)

Write comment?

osvarp

3 months ago, hide # |

The other day I found another problem where derivates of this idea seems to work. Its [JOIST 2024] Escape Route 2 which I found on luogu https://www.luogu.com.cn/problem/P10439

I searched on the internet and all descriptions of solutions followed the same framework of doing 2 trees, one going from left to right and other going from right to left; then each query can be solved in $$$O(min(M_l,M_r)T)$$$ for $$$T$$$ being the query time of the used structure.

I built only 1 tree (the one going from left to right) and used a binary lifting LCA. I grouped all queries starting at a certain $$$l$$$ and sorted them by increasing $$$r$$$. I kept track of all distinct paths from the $$$l$$$ th level to the $$$r$$$ th level identified by the final node on the path. I merged all paths that ended on the same node and used a similar idea to the one on the blog to speedup the binary lifting as I went up the tree. It got AC. Now I don't know if it's because my idea is so weird there weren't cases that pushed it to TL, or if the amortization gods decided it actually has good complexity.

→ Reply