Please read the new rule regarding the restriction on the use of AI tools. ×

humbertoyusta's blog

By humbertoyusta, history, 4 years ago, In English

In this blog I will be talking about a problem that I consider instructive and interesting, because it has several solutions from which some tricks can be learned.

Prerrequisites:

  • Tree basic theory
  • Binary Search
  • Euler Tour
  • Lowest common ancestor
  • Persistent Segment Tree.

Note that you don't need all of them for each approach.

Tricks or techniques that you can learn with the discussion of this problem:

  • Some square root application in descomposition of search space of queries

  • How to know in $$$O(1)$$$ if a node is an ancestor of some other node in a tree

  • Parallel Binary Search

  • The persistent segment tree can be builded over a tree

Story:

I created a version of this problem a while ago (which will be left as bonus challenge at the end of this blog), and I thought to put it in a contest that I am planning (Maybe not in the near future). But then dmga44 (thanks to him for discussing this problem with me, and provide feedback) noticed that it has a data structure painful and not very interesting(for a contest) solution that is the easier to come up with if you are good in data structures, but it has some quite interesting approaches, so I wanted to share it with the community.

The base problem:

You have a tree, and each edge has a weight, the weights of edges are all distinct, and form a permutation of size $$$n - 1$$$. You also have $$$Q$$$ queries of the form $$$(u, v, k)$$$, in the path from node $$$u$$$ to node $$$v$$$, if we add the edges of this path to a vector and sort it in increasing order, which one will be in the k-th position in the sorted vector. $$$1 \leq N, Q \leq 2*10^5$$$.

Maybe you want to think the problem a bit before keep reading.

First insights:

Let the spoilers begin:

We can do exactly what the statement says and get a $$$O(n^2\log{n})$$$ solution, but this solution can't be improved too much, so we need to try a different approach.

We can binary search over the possible results, and for asking if the answer is smaller or equal to some $$$x$$$ we can check if the number of edges in the path that are smaller than $$$x$$$ is greater or equal than $$$k$$$. We can consider all edges smaller or equal than $$$x$$$ as $$$1$$$, and greater as $$$0$$$, then we just need to check if the sum of the path is at most $$$k$$$. This can be done with dfs storing the sum from each node to the root and lowest common ancestor, this solution is also $$$O(n^2\log{n})$$$ but it can be improved to some faster solutions.

Square root approach:

Think about having the sums precomputed for some $$$p$$$, after this when we ask a query, we can easily check if the answer is greater or smaller than $$$p$$$, so we can select some integer $$$r$$$ and precompute the sums from each node to the root if all edges smaller or equal than $$$r$$$ have value $$$1$$$, and the greater ones have value $$$0$$$. Then we will do it for $$$2\cdot r$$$, $$$3\cdot r$$$, ... , $$$\frac{n}{r} \cdot r$$$ and keep the sums. After this when we answer a query we can reduce the space of search of the solution to $$$\frac{n}{r}$$$, finding the largest multiple of $$$r$$$ such that our answer is grater or equal than it.

When the space of search is $$$\frac{n}{r}$$$ we can loop over the edges that have values inside our space of search in increasing order of their values (there are just $$$\frac{n}{r}$$$ of this edges because the weights of the edges form a permutation), then add $$$1$$$ for each edge inside the path, and when we reach $$$k$$$ we are done, to ask in $$$O(1)$$$ if a node $$$a$$$ is ancestor of a node $$$b$$$, we can precompute the euler tour of the tree, doing a dfs and storing the first time that we see each node $$$a$$$ ($$$l_{a}$$$), and the last time that we see each node $$$a$$$ ($$$r_{a}$$$). Then we can say that a node $$$a$$$ is ancestor of $$$b$$$ if and only if $$$l_{a} < l_{b}$$$ and $$$r_{a} > r_{b}$$$. For asking if a node $$$x$$$ is on a path from $$$u$$$ to $$$v$$$ we can say that $$$x$$$ is on the path from $$$u$$$ to $$$v$$$ if $$$x$$$ is an ancestor of $$$u$$$ or $$$v$$$ and is not an ancestor of $$$lca(u, v)$$$, or $$$x$$$ is equal to $$$u$$$ or $$$v$$$.

With $$$q = O(n)$$$ and choosing $$$r = \sqrt{n}$$$ we can have a $$$O(n\sqrt{n})$$$ online solution which is not too hard to code.

Parallel binary search (with divide and conquer) approach:

Doing an individual binary search for each query is too slow, so we need to come up with some other idea, in this case we can think about answering some queries at once for reusing some data, we already know that to ask if a path has at least $$$k$$$ elements smaller or equal than $$$p$$$ we can assign a value of $$$1$$$ to each edge with weight smaller or equal to $$$p$$$ and 0 otherwise.

Initially, the space of search of each query is the segment $$$[1, n]$$$, after the first query it can become $$$[1,n/2]$$$ or $$$[n/2+1,n]$$$, and this will be for all querys, so we can have a function $$$solve(l, r, Q)$$$ where $$$l$$$ is the lower limit of the space of search, $$$r$$$ is the upper limit and $$$Q$$$ is a vector that contains all queries that currently have this range of search. We can select $$$mid = \frac{l + r}{2}$$$ and ask for each query in $$$Q$$$ if its result is greater or smaller than $$$mid$$$ fast, we will see how to do this later, if its result is smaller or equal, we will add it to other vector $$$Q_l$$$, otherwise we will add it to $$$Q_r$$$. After processing all this queries, we will call $$$solve(l, mid, Q_l)$$$ and $$$solve(mid + 1, r , Q_r)$$$ because now we know that all queries in $$$Q_l$$$ has a space of search of $$$[l, mid]$$$ and all queries in $$$Q_r$$$ have a space of search of $$$[mid+1, r]$$$. Thus we have reduced it's space of search by a factor of $$$2$$$.

Calling this function $$$solve(1, n, Q)$$$ where $$$Q$$$ contains all queries will solve our problem offline, since we need to know all the queries before calling our function, the basis case of this function is when $$$l = r$$$, in this case obviously the answer of all queries will be $$$l$$$. Otherwise we can reduce the space of search as wee see in the previous paragraph.

A tutorial about parallel binary seach(the trick we discussed above) can be found here

Now to answer the queries we need to implement some data structure that allows to change an edge's value from 0 to 1, and viceversa, and ask for the sum of edges' values on a path, the path can decomposed be in 3 paths from some nodes to the root(sum the path from $$$u$$$ to the root, sum the path from $$$v$$$ to the root, substract the path from $$$lca(u,v)$$$ to the root twice, and then adding the value of $$$lca(u, v)$$$), this can be done using the tree's euler tour, changing an edge's value can be seen as adding or substracting 1 to a contiguous subarray of the euler tour, and ask for the value of a node, is a query to some element, this can be done with a fenwick tree or a segment tree, achieving a total complexity of $$$O(n\log^2{n})$$$, since each query can be asked at most $$$O(\log{n})$$$ times(because we reduce it's space of search by a factor of $$$2$$$ each time), and we need $$$O(\log{n})$$$ to answer it each time.

PST approach

We are going to make a Persistent segment tree over the alphabet, that is, the i-th leaf of the segment tree will store the number of times that the number i is on the range, or path that this version of the segment tree consider, a node that represents the range $$$[l, r]$$$ will store the number of ocurrences of the numbers from $$$l$$$ to $$$r$$$, doing this, we can do an implicit binary search and answer the queries in $$$O(\log{n})$$$.

An explanation about solving the k-th element in a range of an array with PST can be found here at the section Finding the k -th smallest number in a range.

The first version of the segment tree will consider only the root, then each version will be created adding the the node to the father's version, this way a version will consider the path from the root to the node, then to answer the queries we can do an implicit binary search, adding the values from the versions of $$$u$$$ and $$$v$$$ and substracting the values from the version of $$$lca(u, v)$$$. This way we can solve the problem in $$$O(n\log{n})$$$.

Bonus problem:

You are given a tree, such that each edge can be traversed using two weights, $$$a_i$$$ and $$$b_i$$$, the weight of a path is the maximum weight of an edge on it, and $$$Q$$$ queries in the form $$$(u, v, p)$$$ you'll need to find the minimum possible weight of a path from $$$u$$$ to $$$v$$$ if you can traverse exactly $$$p$$$ edges with it's weight $$$a$$$ and the other edges of the path with it's weight $$$b$$$, it's guaranteed that $$$p$$$ is smaller or equal to the size of the path.

Can you solve it in $$$O(n\log^2{n})$$$?, and in $$$O(n\log{n})$$$?

Hint

Solution in $$$O(n\log^2{n})$$$

Hint to $$$O(n\log{n})$$$

Solution in $$$O(n\log{n})$$$

  • Vote: I like it
  • +182
  • Vote: I do not like it

»
3 years ago, # |
  Vote: I like it +11 Vote: I do not like it

You can also use parallel binary search to solve the problem in the same way as with persistent segment tree in $$$O(n$$$ $$$log$$$ $$$n)$$$. It might seem like updates would take $$$O(log$$$ $$$n)$$$, but since in each iteration you only care about 1 layer in the segment tree you can update that 1 layer in $$$O(1)$$$.

  • »
    »
    3 years ago, # ^ |
      Vote: I like it +8 Vote: I do not like it

    Could you please elaborate a bit more on your approach? I can't understand it and it sounds interesting.

    What do you mean by "use parallel binary search to solve the problem in the same way as with persistent segment tree"?

    • »
      »
      »
      3 years ago, # ^ |
      Rev. 2   Vote: I like it +8 Vote: I do not like it

      In the persistent segment tree solution, for each query you first ask about some node at distance $$$1$$$ from the root of the segment tree at time $$$t$$$, then about some node at distance $$$2$$$ from the root of the segment tree at time $$$t$$$ and so on until you get to some node at distance $$$\left \lceil{log_2(n)}\right \rceil$$$. You do the same thing in this solution.

      You first find for each query the value of that query's first asked node at some time by first updating the segment tree with the value of the root, then answering all queries about time $$$1$$$, then updating it with the value of one of its children, then answering all queries about time $$$2$$$ etc. Since you only care about nodes in the segment tree at distance $$$1$$$ from the root, instead of updating $$$O(log$$$ $$$n)$$$ nodes of the segment tree you only update the one which is at distance $$$1$$$ from the root.

      Repeat the process with the same logic for all the other distances and you have an $$$O(n$$$ $$$log$$$ $$$n)$$$ solution. Code.

»
3 years ago, # |
  Vote: I like it +36 Vote: I do not like it

If you are interested in more (it's not the best or anything but may be interesting or educative for some) — Mo's algorithm on trees comes to mind.

I assume you are familiar with Mo's on trees, so I'll cut to the problem-specific parts. We want to maintain the set of edge weights in the "active segment". Specifically, we need to implement a set that supports the following operations:

  • insert a number in $$$O(1)$$$;
  • delete a number in $$$O(1)$$$;
  • get the $$$k$$$-th smallest number in the set in $$$O(\sqrt{n})$$$.

To do that, maintain an array $$$A$$$ split into blocks of length $$$\sqrt{n}$$$; let $$$A[i] = 1$$$ if $$$i$$$ is present in the set and $$$A[i] = 0$$$ otherwise. We also maintain the sum of each block. Finding the $$$k$$$-th smallest number can be done by iterating over all the blocks.

When we run Mo's algorithm, we need to do $$$O(n \sqrt{n})$$$ insertions and deletions, but only find the $$$k$$$-th smallest element $$$O(n)$$$ times. This gives us a total complexity of $$$O(n \sqrt{n})$$$.

This algorithm is interesting/noteworthy to me because if I just used a normal set (to support each operation in $$$O(\log n)$$$), Mo's algorithm would have $$$O(n \sqrt{n} \log n)$$$ complexity which would surely TLE. But if you make insertions and deletions cheaper at the expense of the third operation, then it works out nicely.