K-th smallest edge on a tree's path, a tutorial about it's approaches

#	User	Rating
1	Benq	3792
2	VivaciousAubergine	3647
3	Kevin114514	3611
4	jiangly	3583
5	strapple	3515
6	tourist	3470
7	Radewoosh	3415
8	Um_nik	3376
9	maroonrk	3361
10	XVIII	3345

#	User	Contrib.
1	Qingyu	162
2	adamant	148
3	Um_nik	146
4	Dominater069	143
5	errorgorn	141
6	cry	138
7	Proof_by_QED	136
8	YuukiS	135
9	chromate00	134
10	soullless	133

In this blog I will be talking about a problem that I consider instructive and interesting, because it has several solutions from which some tricks can be learned.

Prerrequisites:

Tree basic theory
Binary Search
Euler Tour
Lowest common ancestor
Persistent Segment Tree.

Note that you don't need all of them for each approach.

Tricks or techniques that you can learn with the discussion of this problem:

Some square root application in descomposition of search space of queries
How to know in $$$O(1)$$$ if a node is an ancestor of some other node in a tree
Parallel Binary Search
The persistent segment tree can be builded over a tree

Story:

I created a version of this problem a while ago (which will be left as bonus challenge at the end of this blog), and I thought to put it in a contest that I am planning (Maybe not in the near future). But then dmga44 (thanks to him for discussing this problem with me, and provide feedback) noticed that it has a data structure painful and not very interesting(for a contest) solution that is the easier to come up with if you are good in data structures, but it has some quite interesting approaches, so I wanted to share it with the community.

The base problem:

You have a tree, and each edge has a weight, the weights of edges are all distinct, and form a permutation of size $$$n - 1$$$. You also have $$$Q$$$ queries of the form $$$(u, v, k)$$$, in the path from node $$$u$$$ to node $$$v$$$, if we add the edges of this path to a vector and sort it in increasing order, which one will be in the k-th position in the sorted vector. $$$1 \leq N, Q \leq 2*10^5$$$.

Maybe you want to think the problem a bit before keep reading.

First insights:

Let the spoilers begin:

We can do exactly what the statement says and get a $$$O(n^2\log{n})$$$ solution, but this solution can't be improved too much, so we need to try a different approach.

We can binary search over the possible results, and for asking if the answer is smaller or equal to some $$$x$$$ we can check if the number of edges in the path that are smaller than $$$x$$$ is greater or equal than $$$k$$$. We can consider all edges smaller or equal than $$$x$$$ as $$$1$$$, and greater as $$$0$$$, then we just need to check if the sum of the path is at most $$$k$$$. This can be done with dfs storing the sum from each node to the root and lowest common ancestor, this solution is also $$$O(n^2\log{n})$$$ but it can be improved to some faster solutions.

Square root approach:

Think about having the sums precomputed for some $$$p$$$, after this when we ask a query, we can easily check if the answer is greater or smaller than $$$p$$$, so we can select some integer $$$r$$$ and precompute the sums from each node to the root if all edges smaller or equal than $$$r$$$ have value $$$1$$$, and the greater ones have value $$$0$$$. Then we will do it for $$$2\cdot r$$$, $$$3\cdot r$$$, ... , $$$\frac{n}{r} \cdot r$$$ and keep the sums. After this when we answer a query we can reduce the space of search of the solution to $$$\frac{n}{r}$$$, finding the largest multiple of $$$r$$$ such that our answer is grater or equal than it.

When the space of search is $$$\frac{n}{r}$$$ we can loop over the edges that have values inside our space of search in increasing order of their values (there are just $$$\frac{n}{r}$$$ of this edges because the weights of the edges form a permutation), then add $$$1$$$ for each edge inside the path, and when we reach $$$k$$$ we are done, to ask in $$$O(1)$$$ if a node $$$a$$$ is ancestor of a node $$$b$$$, we can precompute the euler tour of the tree, doing a dfs and storing the first time that we see each node $$$a$$$ ($$$l_{a}$$$), and the last time that we see each node $$$a$$$ ($$$r_{a}$$$). Then we can say that a node $$$a$$$ is ancestor of $$$b$$$ if and only if $$$l_{a} \lt l_{b}$$$ and $$$r_{a} \gt r_{b}$$$. For asking if a node $$$x$$$ is on a path from $$$u$$$ to $$$v$$$ we can say that $$$x$$$ is on the path from $$$u$$$ to $$$v$$$ if $$$x$$$ is an ancestor of $$$u$$$ or $$$v$$$ and is not an ancestor of $$$lca(u, v)$$$, or $$$x$$$ is equal to $$$u$$$ or $$$v$$$.

With $$$q = O(n)$$$ and choosing $$$r = \sqrt{n}$$$ we can have a $$$O(n\sqrt{n})$$$ online solution which is not too hard to code.

Parallel binary search (with divide and conquer) approach:

Doing an individual binary search for each query is too slow, so we need to come up with some other idea, in this case we can think about answering some queries at once for reusing some data, we already know that to ask if a path has at least $$$k$$$ elements smaller or equal than $$$p$$$ we can assign a value of $$$1$$$ to each edge with weight smaller or equal to $$$p$$$ and 0 otherwise.

Initially, the space of search of each query is the segment $$$[1, n]$$$, after the first query it can become $$$[1,n/2]$$$ or $$$[n/2+1,n]$$$, and this will be for all querys, so we can have a function $$$solve(l, r, Q)$$$ where $$$l$$$ is the lower limit of the space of search, $$$r$$$ is the upper limit and $$$Q$$$ is a vector that contains all queries that currently have this range of search. We can select $$$mid = \frac{l + r}{2}$$$ and ask for each query in $$$Q$$$ if its result is greater or smaller than $$$mid$$$ fast, we will see how to do this later, if its result is smaller or equal, we will add it to other vector $$$Q_l$$$, otherwise we will add it to $$$Q_r$$$. After processing all this queries, we will call $$$solve(l, mid, Q_l)$$$ and $$$solve(mid + 1, r , Q_r)$$$ because now we know that all queries in $$$Q_l$$$ has a space of search of $$$[l, mid]$$$ and all queries in $$$Q_r$$$ have a space of search of $$$[mid+1, r]$$$. Thus we have reduced it's space of search by a factor of $$$2$$$.

Calling this function $$$solve(1, n, Q)$$$ where $$$Q$$$ contains all queries will solve our problem offline, since we need to know all the queries before calling our function, the basis case of this function is when $$$l = r$$$, in this case obviously the answer of all queries will be $$$l$$$. Otherwise we can reduce the space of search as wee see in the previous paragraph.

A tutorial about parallel binary seach(the trick we discussed above) can be found here

Now to answer the queries we need to implement some data structure that allows to change an edge's value from 0 to 1, and viceversa, and ask for the sum of edges' values on a path, the path can decomposed be in 3 paths from some nodes to the root(sum the path from $$$u$$$ to the root, sum the path from $$$v$$$ to the root, substract the path from $$$lca(u,v)$$$ to the root twice, and then adding the value of $$$lca(u, v)$$$), this can be done using the tree's euler tour, changing an edge's value can be seen as adding or substracting 1 to a contiguous subarray of the euler tour, and ask for the value of a node, is a query to some element, this can be done with a fenwick tree or a segment tree, achieving a total complexity of $$$O(n\log^2{n})$$$, since each query can be asked at most $$$O(\log{n})$$$ times(because we reduce it's space of search by a factor of $$$2$$$ each time), and we need $$$O(\log{n})$$$ to answer it each time.

PST approach

We are going to make a Persistent segment tree over the alphabet, that is, the i-th leaf of the segment tree will store the number of times that the number i is on the range, or path that this version of the segment tree consider, a node that represents the range $$$[l, r]$$$ will store the number ocurrence of the numbers from $$$l$$$ to $$$r$$$, doing this, we can do an implicit binary search and answer the queries in $$$O(\log{n})$$$.

An explanation about solving the k-th element in a range of an array with PST can be found here at the section Finding the k -th smallest number in a range.

The first version of the segment tree will consider only the root, then each version will be created adding the the node to the father's version, this way a version will consider the path from the root to the node, then to answer the queries we can do an implicit binary search, adding the values from the versions of $$$u$$$ and $$$v$$$ and substracting the values from the version of $$$lca(u, v)$$$. This way we can solve the problem in $$$O(n\log{n})$$$.

Bonus problem:

You are given a tree, such that each edge can be traversed using two weights, $$$a_i$$$ and $$$b_i$$$, the weight of a path is the maximum weight of an edge on it, and $$$Q$$$ queries in the form $$$(u, v, p)$$$ you'll need to find the minimum possible weight of a path from $$$u$$$ to $$$v$$$ if you can traverse exactly $$$p$$$ edges with it's weight $$$a$$$ and the other edges of the path with it's weight $$$b$$$, it's guaranteed that $$$p$$$ is smaller or equal to the size of the path.

Can you solve it in $$$O(n\log^2{n})$$$?, and in $$$O(n\log{n})$$$?

Rev.	By	When	Δ	Comment
en5	humbertoyusta	2021-04-12 19:12:12	1
en4	humbertoyusta	2021-04-11 23:59:15	2495
en3	humbertoyusta	2021-04-11 20:20:02	4	Tiny change: 'e number ocurrence of the nu' -> 'e number of ocurrences of the nu'
en2	humbertoyusta	2021-04-11 20:15:42	106	(published)
en1	humbertoyusta	2021-04-11 20:02:54	8585	Initial revision (saved to drafts)