A general approach to solve subree distinct values queries!

#	User	Rating
1	tourist	3985
2	jiangly	3814
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3517
7	Radewoosh	3410
8	hos.lyric	3399
9	ecnerwala	3392
9	Um_nik	3392

#	User	Contrib.
1	cry	169
2	maomao90	162
2	Um_nik	162
4	atcoder_official	161
5	djm03178	158
6	-is-this-fft-	157
7	adamant	155
8	awoo	154
8	Dominater069	154
10	luogu_official	150

Hello Codeforces ,today I want to introduce a technique that can solve some problems related to distinct values queries on a subtree .The only way I know to solve array distinct values queries with updates is using MO's algorithm .The approach I will introduce can solve the subtree version of this problem online with updates in $O(Nlog^2(N))$.So let's get started.↵
↵
**Prerequisites:**↵
↵
1.Heavy light decomposition (it's enough to know how it works and how to implement it).↵
↵
2.Lowest common ancestor.↵
↵
3.Euler tours (not necessary for this specific problem but I will use it to prove some fact).↵
↵
**Problem Statement:**↵
↵
You are given a tree rooted at vertex 1. You have to perform two types of queries:↵
↵
1 $i val$ : update the value of node $i$ to $val$.↵
↵
2 $u$ : output the sum of distinct values in the subtree of node $u$.↵
↵
**Solution:**↵
↵
Whenever you encounter such a strange problem try to make a shift in perspective .What is the first thing you thought when you have seen this problem? Euler tour? Persistent Segment trees? Isn't it similar to the array distinct values queries?↵
But wait ,I just said it's hard to solve this problem. Didn't I?↵
↵
Can we change the problem into path queries problem?↵
↵
Wait what are the nodes that are affected by an update?↵
↵
I turns out that the affected nodes are all in a range on the ancestry path of the updated node .Don't it seem like a standard HLD problem?↵
↵
Not yet .Here comes the idea .Consider our new problem:↵
↵
We want to update the answers in a range on the ancestry path of a node .So ,where does this range end?↵
If we can figure out how to do this then the problem  becomes a standard path update and queries problem that can be easily solved using HLD .Now, here is the fun part.↵
↵
Isn't it obvious the endpoint is the first node on the ancestry path that doesn't have an occurrence of $val$?↵
↵
How can we find such a node? well this is the same as finding the first node on the path that have an occurrence then going down by one edge.↵
↵
It turns out that this node is the first node that have the closest occurrence of $val$ in its subtree.↵
But what does it mean for a node to be be the closest?↵
↵
Well ,let's consider this node to be the $lca$ of the updated node and the closet node since the $lca$ is the first node on the ancestry path from the updated node to the root .Now the problem boils down to finding the maximum depth $lca$ among all possible $lca$ s ,but how can we do this fast?↵
↵
Well, if we maintain a map of sets that stores for each number the $time in$ s in DFS order for the nodes with a value equal to that number then the node we are searching for is the node with maximum depth among $lca($updated node ,the first node with time in bigger than  the $time in$ of the updated node$)$ and↵
$lca($updated node ,the first node with time in smaller than the $time in$ of the updated node$)$.But why is this true in general?↵
↵
**Proof of the above fact:**↵
↵
Let's consider the case of (the first node with time in bigger than the time in of the updated node(let's call this node $u$)) then the second case will go through an identical reasoning.↵
↵
Consider taking a node with a bigger time in than $u$ (let's call it $v$).↵
↵
Let $x$ be the first node on the ancestry path then:↵
↵
If a node $d$ is the closest node to $i$ then $d$ must be in the subtree of $x$.↵
Because $in[i]<in[u]<in[v]$ if $v$ was in the subtree of $x$ and $i$ was in the subtree of $x$ then also $u$ is in the subtree of $x$ .Thus taking $u$ is at least as optimal as taking node v.↵
↵
**Note** that there is a corner case when there is at least one occurrence of $val$ in the subtree of $i$ we don't update anything.↵
↵
**Also** note that when we update a node we should remove the old value and add the new value but this is can be done in the same way the adding process is done thus I leave it as an exercise to the reader to check this.↵
  ↵
**A recap for what we did:**↵
↵
You could reflect on how we transformed a subtree queries problem into a path queries problem which turns out to be much easier if we think that way. Then we used a fact that facilitates using the bruteforce approach by finding optimal values to try .↵
↵
**Implementation:**↵
↵
There are many implementation details that I skipped above but if you are curious here is my code:↵
[submission:258971709]↵
↵
I couldn't find any link for a problem that uses this technique so I made one .↵
↵
link: [problem:521454A]↵
↵
You can also optimize heavy light decomposition see https://mirror.codeforces.com/blog/entry/104997.↵
↵
Thank you for reading my blog.↵
↵
I spent a lot of time preparing this so I hope you loved it.↵
↵
**P.S. This is my first educational blog .Sorry for long text :) .**

Rev.	By	When	Δ	Comment
en14	Haidora	2024-05-01 21:49:55	1	Tiny change: ' resources.' -> ' resources .'
en13	Haidora	2024-05-01 16:42:24	2	Tiny change: '\n\nNote that th' -> '\n\nNote : that th'
en12	Haidora	2024-05-01 15:06:42	4
en11	Haidora	2024-05-01 14:26:30	355
en10	Haidora	2024-05-01 12:13:53	6346	Tiny change: '\n...\n```\n#include' -> '\n...\n```c++\n#include'
en9	Haidora	2024-05-01 11:34:08	98
en8	Haidora	2024-05-01 11:27:03	4	Tiny change: 'tes in $O(Nlog^2(N))$' -> 'tes in $O((N+Q)log^2(N))$'
en7	Haidora	2024-05-01 11:21:56	48
en6	Haidora	2024-05-01 11:06:09	85
en5	Haidora	2024-05-01 10:44:27	2
en4	Haidora	2024-05-01 10:43:39	1	Tiny change: 'ed node$)$.But why i' -> 'ed node$)$ .But why i'
en3	Haidora	2024-05-01 10:39:32	40
en2	Haidora	2024-05-01 10:36:24	790	Tiny change: 'bles $lca$s ,but how' -> 'bles $lca$ s ,but how' (published)
en1	Haidora	2024-05-01 10:08:59	4269	Initial revision (saved to drafts)

History