By shadow9236, history, 2 years ago,

# Introduction

Hello Codeforces,
I came across this problem a while back — https://cses.fi/problemset/task/1139/. Here the task is — "Given a colored, rooted tree, determine the number of distinct colors in each subtree of the tree". It is possible to solve this problem by flattening the tree and then using a binary indexed tree(now that subtrees have converted into subarrays). I will discuss an alternate technique to solve this, which can help in cases where it is useful to have the entire subtree as a set in a DFS call.

# $O(n^2)$ solution

If $n$ were small, we could have stored for each node, the set of elements in its subtree. A DFS can be used to evaluate this, which will take $O(n^2log n)$ time and $O(n^2)$ space.

vector<int> color(n);
vector<set<int>> subtree(n);

void dfs(int i,int parent){
subtree[i].insert(color[i]);
for(int node : graph[i]){
if(node != parent){
dfs(node,i);
subtree[i].insert(subtree[node].begin(),subtree[node].end());
}
}
}


The number of distinct colors in the subtree of the $i^{\text{th}}$ node is just the size of the set corresponding to it.

# An Improvement

The problem with this technique is repetition. Each element occurs in $depth[i]$ different sets which is inefficient. For an improvement, we can consider the same idea used in heavy-light decomposition and union-find optimizations, which is "small-to-large merging".

We will use a DFS as before. To construct the set of all elements in the node's subtree, we will pick the child with the largest subtree set. Instead of making a new set for the current node, we will use the same set as the child with the largest subtree and insert all other children's sets in it. To avoid consuming extra space and time, we will use pointers to accomplish this task.

vector<int> color,distinct;
vector<set<int>*> subtree;

void dfs(int i,int parent = -1){
int largest = -1;
vector<int> children;
for(int node : graph[i]){
if(node != parent){
dfs(node,i);
children.push_back(node);
if(largest == -1 || subtree[largest]->size() < subtree[node]->size()){
largest = node;
}
}
}
if(largest == -1){
subtree[i] = new set<int>; // new set for leaf node
}
else{
subtree[i] = subtree[largest]; // largest sized child
}

for(int child : children){
if(child == largest)continue;
subtree[i]->insert(subtree[child]->begin(),subtree[child]->end());
}
subtree[i]->insert(color[i]);
distinct[i] = subtree[i]->size();
}


# Time complexity

Every time we copy an element over, the set it is now in will be at least 2 times larger than the set it was previously in. Hence the time complexity is $O(n log^2 n)$ (or $O(n log n)$ if we avoid using sets).

Thanks to -is-this-fft- for this

# Conclusion

Here is my solution for the CSES task — https://cses.fi/paste/e75f5efc2e1cd2fc3c8a66/

This technique can be used in various other tree problems. Examples of such problems could be to find the most frequent element in each node's subtree or to find the pair with minimum difference in each node's subtree.

• +78

| Write comment?
 » 2 years ago, # | ← Rev. 3 →   +3 Can you share your Fenwick Tree approach?BTW, the swap function makes it extremely easy to implement small-to-large. Submission
•  » » 2 years ago, # ^ | ← Rev. 3 →   +3 Consider the problem to be querying for all subtrees. Flatten the tree. Now subtree queries can be treated as subarray queries. Sort all queries about their ending point. Now iterate from left to right in the array. In the binary indexed tree, set $i$ to 1 if $i$ is the current rightmost occurence of $a[i]$ and 0 otherwise. The answer for a query ending at $i$ is just the rangesum of it in the binary indexed tree.This works because we are supposed to count exactly one occurence of each element in a range query. We have counted the rightmost occurence of each element and also sorted the queries about their right points.Submission using fenwick tree — https://cses.fi/paste/e5bcaa36945e6b663c8bb1/Your submission is not visible, you will have to go to "share this code to others" option
•  » » » 2 years ago, # ^ |   0 Fixed.
 » 2 years ago, # |   +13 I guess that this is similar to small to large merging on tree . Arpa has a blog about this where he refers to it as Sack on tree.
•  » » 2 years ago, # ^ |   +36 This is precisely small-to-large merging.Personally, I really dislike the Arpa blog because it kinda doesn't explain anything and instead just consists of like 15 different code samples that obscure the idea more than anything.
•  » » » 2 years ago, # ^ |   0 Agreed. But I remember that someone wrote a blog that explained the technique. It must be preset in blog comments.
 » 2 years ago, # | ← Rev. 2 →   +29 First of all, a nice additional problem on this topic: The Cost of Speed Limits from ICPC 2020.As for the blog, the code could be simplified greatly. Firstly, you don't need to compare children by the sizes of the answer, the sizes of their subtrees also works because of the exact same reasons. So, we can precompute largest[i] as the children with the largest subtree. Additionally, you don't need to use raw pointers and similar stuff, std::move works like a charm. And, since you don't store sets explicitly anyway, there is no need in std::move as well, instead, we can rely on RVO. So, the code would look like thisset dfs(int i, int parent = -1) { auto ans = largest[i] == -1 ? set{} : dfs(largest[i], i); for (auto it : graph[i]) if (it != parent && it != largest[i]) { auto tmp = dfs(it, i); ans.insert(tmp.begin(), tmp.end()); } ans.insert(color[i]); distinct[i] = ans.size(): return ans; } UPD. I submitted the code to the system, it works.
 » 2 years ago, # |   +17 Great short Blog. I created a video on the same technique sometime back : https://www.youtube.com/watch?v=92unAh2APJ0Adding your blog to the description :D
 » 14 months ago, # | ← Rev. 3 →   0 "Every time we copy an element over, the set it is now in will be at least 2 times larger than the set it was previously in"This is not true right? Say set_1 = {1,2} and set_2 = {1,2,3}, size of the merged set is < 4. I think merging based on size of the subtree is better
 » 14 months ago, # |   +9 600-E Lomsat gelral also uses this approach.Submission.
 » 12 months ago, # |   0 Can someone explain why this implementation not causing MLE? since space complexity can go O(n^2)
•  » » 6 months ago, # ^ |   0 The space complexity is O(nlogn). It is almost the same as time complexity because we only use memory when we are merging. So check out the above proof for the merging time complexity.
 » 6 months ago, # |   0 Another good problem that requires this technique. Here
•  » » 5 months ago, # ^ |   0 I landed on this blog while solving the same problem :D
•  » » » 5 months ago, # ^ |   +1 same here.lol
 » 5 months ago, # |   0 For upcoming readers, if you cannot comprehend how the time complexity is O(NlogN^2) then this video might help
•  » » 6 weeks ago, # ^ | ← Rev. 2 →   0 For this specific problem, size do not have to double because sets could have common elements. Correct? Edit: I think the size of smaller child is assumed to be the size of elements in small child that are not in the bigger child.
•  » » » 6 weeks ago, # ^ | ← Rev. 2 →   0 Size does not always get doubled, there could be a less increment in size. Double is just the upper bound on size increase, indicating that the increase in size will never exceed than 100%. If you want an example where the size gets doubled, then you can consider a perfect binary tree where all $n$ nodes have $n$ distinct colors.