Hello Codeforces ,today I want to introduce a technique that can solve some problems related to distinct values queries on a subtree .The only way I know to solve array distinct values queries with updates is using MO's algorithm .The approach I will introduce can solve the subtree version of this problem online with updates in $O(Nlog^2(N))$.So let's get started.↵
↵
**Prerequisites:**↵
↵
1.Heavy light decomposition (it's enough to know how it works and how to implement it).↵
↵
2.Lowest common ancestor.↵
↵
3.Euler tours (not necessary for this specific problem but I will use it to prove some fact).↵
↵
**Problem Statement:**↵
↵
You are given a tree rooted at vertex 1. You have to perform two types of queries:↵
↵
1 $i val$ : update the value of node $i$ to $val$.↵
↵
2 $u$ : output the sum of distinct values in the subtree of node $u$.↵
↵
**Solution:**↵
↵
Whenever you encounter such a strange problem try to make a shift in perspective .What is the first thing you thought when you have seen this problem? Euler tour? Persistent Segment trees? Isn't it similar to the array distinct values queries?↵
But wait ,I just said it's hard to solve this problem. Didn't I?↵
↵
Can we change the problem into path queries problem?↵
↵
Wait what are the nodes that are affected by an update?↵
↵
I turns out that the affected nodes are all in a range on the ancestry path of the updated node .Don't it seem like a standard HLD problem?↵
↵
Not yet .Here comes the idea .Consider our new problem:↵
↵
We want to update the answers in a range on the ancestry path of a node .So ,where does this range end?↵
If we can figure out how to do this then the problem becomes a standard path update and queries problem that can be easily solved using HLD .Now, here is the fun part.↵
↵
Isn't it obvious the endpoint is the first node on the ancestry path that doesn't have an occurrence of $val$?↵
↵
How can we find such a node? well this is the same as finding the first node on the path that have an occurrence then going down by one edge.↵
↵
It turns out that this node is the first node that have the closest occurrence of $val$ in its subtree.↵
But what does it mean for a node to be be the closest?↵
↵
Well ,let's consider this node to be the $lca$ of the updated node and the closet node since the $lca$ is the first node on the ancestry path from the updated node to the root .Now the problem boils down to finding the maximum depth $lca$ among all possible $lca$ s ,but how can we do this fast?↵
↵
Well, if we maintain a map of sets that stores for each number the $time in$ s in DFS order for the nodes with a value equal to that number then the node we are searching for is the node with maximum depth among $lca($updated node ,the first node with time in bigger than the $time in$ of the updated node $)$ and↵
$lca($updated node ,the first node with time in smaller than the $time in$ of the updated node $)$ .But why is this true in general?↵
↵
**Proof of the above fact:**↵
↵
Let's consider the case of (the first node with time in bigger than the time in of the updated node(let's call this node $u$)) then the second case will go through an identical reasoning.↵
↵
Consider taking a node with a bigger time in than $u$ (let's call it $v$).↵
↵
Let $x$ be the first node on the ancestry path then:↵
↵
If a node $d$ is the closest node to $i$ then $d$ must be in the subtree of $x$.↵
Because $in[i]<in[u]<in[v]$ if $v$ was in the subtree of $x$ and $i$ was in the subtree of $x$ then also $u$ is in the subtree of $x$ .Thus taking $u$ is at least as optimal as taking node v.↵
↵
**Note** that there is a corner case when there is at least one occurrence of $val$ in the subtree of $i$ we don't update anything.↵
↵
**Also** note that when we update a node we should remove the old value and add the new value but this is can be done in the same way the adding process is done thus I leave it as an exercise to the reader to check this.↵
↵
**A recap for what we did:**↵
↵
You could reflect on how we transformed a subtree queries problem into a path queries problem which turns out to be much easier if we think that way. Then we used a fact that facilitates using the bruteforce approach by finding optimal values to try .↵
↵
**Implementation:**↵
↵
There are many implementation details that I skipped above but if you are curious here is my code:↵
[submission:258971709]↵
↵
I couldn't find any link for a problem that uses this technique so I made one .↵
↵
link: [problem:521454A]↵
↵
https://mirror.codeforces.com/contestInvitation/02c14ae6e9e06b8f0c6a3ce7eb957a1f3ba2f865↵
↵
You can also optimize heavy light decomposition see https://mirror.codeforces.com/blog/entry/104997.↵
↵
Thank you for reading my blog.↵
↵
I spent a lot of time preparing this so I hope you loved it.↵
↵
**P.S. This is my first educational blog .Sorry for long text :) .**
↵
**Prerequisites:**↵
↵
1.Heavy light decomposition (it's enough to know how it works and how to implement it).↵
↵
2.Lowest common ancestor.↵
↵
3.Euler tours (not necessary for this specific problem but I will use it to prove some fact).↵
↵
**Problem Statement:**↵
↵
You are given a tree rooted at vertex 1. You have to perform two types of queries:↵
↵
1 $i val$ : update the value of node $i$ to $val$.↵
↵
2 $u$ : output the sum of distinct values in the subtree of node $u$.↵
↵
**Solution:**↵
↵
Whenever you encounter such a strange problem try to make a shift in perspective .What is the first thing you thought when you have seen this problem? Euler tour? Persistent Segment trees? Isn't it similar to the array distinct values queries?↵
But wait ,I just said it's hard to solve this problem. Didn't I?↵
↵
Can we change the problem into path queries problem?↵
↵
Wait what are the nodes that are affected by an update?↵
↵
I turns out that the affected nodes are all in a range on the ancestry path of the updated node .Don't it seem like a standard HLD problem?↵
↵
Not yet .Here comes the idea .Consider our new problem:↵
↵
We want to update the answers in a range on the ancestry path of a node .So ,where does this range end?↵
If we can figure out how to do this then the problem becomes a standard path update and queries problem that can be easily solved using HLD .Now, here is the fun part.↵
↵
Isn't it obvious the endpoint is the first node on the ancestry path that doesn't have an occurrence of $val$?↵
↵
How can we find such a node? well this is the same as finding the first node on the path that have an occurrence then going down by one edge.↵
↵
It turns out that this node is the first node that have the closest occurrence of $val$ in its subtree.↵
But what does it mean for a node to be be the closest?↵
↵
Well ,let's consider this node to be the $lca$ of the updated node and the closet node since the $lca$ is the first node on the ancestry path from the updated node to the root .Now the problem boils down to finding the maximum depth $lca$ among all possible $lca$ s ,but how can we do this fast?↵
↵
Well, if we maintain a map of sets that stores for each number the $time in$ s in DFS order for the nodes with a value equal to that number then the node we are searching for is the node with maximum depth among $lca($updated node ,the first node with time in bigger than the $time in$ of the updated node $)$ and↵
$lca($updated node ,the first node with time in smaller than the $time in$ of the updated node $)$ .But why is this true in general?↵
↵
**Proof of the above fact:**↵
↵
Let's consider the case of (the first node with time in bigger than the time in of the updated node(let's call this node $u$)) then the second case will go through an identical reasoning.↵
↵
Consider taking a node with a bigger time in than $u$ (let's call it $v$).↵
↵
Let $x$ be the first node on the ancestry path then:↵
↵
If a node $d$ is the closest node to $i$ then $d$ must be in the subtree of $x$.↵
Because $in[i]<in[u]<in[v]$ if $v$ was in the subtree of $x$ and $i$ was in the subtree of $x$ then also $u$ is in the subtree of $x$ .Thus taking $u$ is at least as optimal as taking node v.↵
↵
**Note** that there is a corner case when there is at least one occurrence of $val$ in the subtree of $i$ we don't update anything.↵
↵
**Also** note that when we update a node we should remove the old value and add the new value but this is can be done in the same way the adding process is done thus I leave it as an exercise to the reader to check this.↵
↵
**A recap for what we did:**↵
↵
You could reflect on how we transformed a subtree queries problem into a path queries problem which turns out to be much easier if we think that way. Then we used a fact that facilitates using the bruteforce approach by finding optimal values to try .↵
↵
**Implementation:**↵
↵
There are many implementation details that I skipped above but if you are curious here is my code:↵
[submission:258971709]↵
↵
I couldn't find any link for a problem that uses this technique so I made one .↵
↵
link: [problem:521454A]↵
↵
https://mirror.codeforces.com/contestInvitation/02c14ae6e9e06b8f0c6a3ce7eb957a1f3ba2f865↵
↵
You can also optimize heavy light decomposition see https://mirror.codeforces.com/blog/entry/104997.↵
↵
Thank you for reading my blog.↵
↵
I spent a lot of time preparing this so I hope you loved it.↵
↵
**P.S. This is my first educational blog .Sorry for long text :) .**