[Tutorial] Persistent DSU made trivial

#	User	Rating
1	tourist	3985
2	jiangly	3814
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3517
7	Radewoosh	3410
8	hos.lyric	3399
9	ecnerwala	3392
9	Um_nik	3392

#	User	Contrib.
1	cry	169
2	maomao90	162
2	Um_nik	162
4	atcoder_official	161
5	djm03178	158
6	-is-this-fft-	157
7	adamant	155
8	awoo	154
8	Dominater069	154
10	luogu_official	150

Introduction:

This blog covers how a DSU can save all of its previous versions after several union operations which there seems to be a lack of resources to discuss. Thanks to MinaRagy06 for helping me write this blog!

A basic DSU can be implemented in the following way:

DSU Code

int parent[MAX_N], size[MAX_N];

int get_root(int node){
	if(node == parent[node])
		return node;
	return get_root(parent[node]);
}
void union_sets(int u, int v){
	u = get_root(u);
	v = get_root(v);
	if(u == v)
		return;
	if(size[u] > size[v])
		swap(u, v);
	size[v] += size[u];
	parent[u] = v; 
}

Next, I will explain how to store more information to answer queries that require time travelling.

Main Idea:

Problem 1 (Easy) :

Let's start by solving a simple CSES Problem.

The problem can be reduced to binary searching on the first time nodes $$$a$$$ and $$$b$$$ are connected. Now let's learn how to check connectivity during any time using persistent DSU.

We maintain an extra array $$$\text{time_changed}$$$ where it stores the first time a node does not become a root after adding the edges in order. Now we can run $$$\text{get_root}$$$ with an extra parameter $$$\text{time}$$$ to find the root of that node during a certain time and check if nodes $$$a$$$ and $$$b$$$ have the same root at that specific time during binary search.

Time Complexity: $$$O(N + M \cdot {log} N + Q \cdot {log} M \cdot {log} N)$$$

Solution Code

#include <bits/stdc++.h>
using namespace std;
const int MAX_N = 2e5+5;
int n, m, q, parent[MAX_N], time_changed[MAX_N], size[MAX_N];
int get_root(int node, int time){
	if(parent[node] == node || time_changed[node] > time)
		return node;
	return get_root(parent[node], time);
}
void union_sets(int a, int b, int time){
	a = get_root(a, time);
	b = get_root(b, time);
	if(a == b)
		return;
	if(size[a] > size[b])
		swap(a, b);
	size[b] += size[a];
	parent[a] = b;
	time_changed[a] = time;
}
int main(){
	ios::sync_with_stdio(0); cin.tie(0);
	cin >> n >> m >> q;
	for(int i = 1; i <= n; i++){
		parent[i] = i;
		size[i] = 1;
	}
	for(int i = 1; i <= m; i++){
		int a, b; cin >> a >> b;
		union_sets(a, b, i);
	}
	while (q--) {
		int a, b; cin >> a >> b;
		int l = 0, r = m, mid, ans = -1;
		while (l <= r) {
			mid = (l + r) / 2;
			if (get_root(a, mid) == get_root(b, mid)) {
				ans = mid;
				r = mid - 1;
			} else {
				l = mid + 1;
			}
		}
		cout << ans << "\n";
	}
	return 0;
}

Problem 2 (Medium) :

Consider the following problem, there is a Graph consisting of $$$N$$$ nodes and initially there are no edges you are given $$$Q$$$ queries which you have to solve online of the type:

Add an Edge between node $$$A$$$ and node $$$B$$$
Check Whether Node $$$A$$$ and Node $$$B$$$ are in the same connected component after the $$$X$$$-th Query
Find the Number of Nodes in the connected component of Node $$$A$$$ after the $$$X$$$-th Query

Constraints:

$$$1 \le N,Q \le 2 \cdot 10^5$$$

$$$1 \le A,B \le N$$$

$$$1 \le X_i \le i$$$

In this problem, we also have to use the $$$\text{time_changed}$$$ array. In addition, we need to save the versions of each node such that it was a root after adding an edge in its component along with the size (or any new information we need to save in other problems).

We save two vectors for each node $$$\text{version}$$$ and $$$\text{size}$$$. whenever we add an edge, we push back the time to the new root along with the new total size.

Now whenever we want to get the size of component $$$A$$$ at time $$$T$$$, we find the root of $$$A$$$ at time $$$T$$$ and binary search on the largest index $$$pos$$$ such that $$$version[A][pos] \le T$$$ and return $$$size[A][pos]$$$ as the answer.

Time complexity: $$$O(N + Q \cdot {log} N)$$$

Solution Code

#include <bits/stdc++.h>
using namespace std;
const int MAX_N = 2e5+5;
int n, q, parent[MAX_N], time_changed[MAX_N];
vector<int>version[MAX_N], size[MAX_N];
int get_root(int node, int time){
	if(parent[node] == node || time_changed[node] > time)
		return node;
	return get_root(parent[node], time);
}
void union_sets(int a, int b, int time){
	a = get_root(a, time);
	b = get_root(b, time);
	if(a == b)
		return;
	if(size[a] > size[b])
		swap(a, b);
	parent[a] = b;
	time_changed[a] = time;
	version[b].push_back(time);
	size[b].push_back(size[a].back() + size[b].back());
}
int main(){
	ios::sync_with_stdio(0); cin.tie(0);
	cin >> n >> q;
	for(int i = 1; i <= n; i++){
		parent[i] = i;
		version[i].push_back(0);
		size[i].push_back(1);
	}
	for(int i=1; i <= q;i++){
		int type; cin >> type;
		if(type == 1){
			int a, b; cin >> a >> b;
			union_sets(a, b, i);
		}else if(type == 2){
			int a, b, X; cin >> a >> b >> X;
			cout << (get_root(a, X) == get_root(b, X)? "YES" : "NO") << "\n";
		}else{
			int a, X; cin >> a >> X;
			a = get_root(a, X);
			int pos = upper_bound(version[a].begin(),version[a].end(),X) - version[a].begin();
			cout << size[a][pos-1] << "\n";
		}
	}
	return 0;
}

Problem 3 (Hard) :

$$$\textbf{Prerequisites: Persistent segment tree}$$$

Consider the following problem, initially you have $$$M = 1$$$ graphs of $$$N$$$ nodes with no edges, and you are given $$$Q$$$ queries which you have to solve online of the type:

Copy the $$$k$$$-th graph and label the new graph $$$M + 1$$$ and set $$$M = M + 1$$$ then add a new edge connecting nodes $$$A$$$ and $$$B$$$ in this graph.
Check whether node $$$A$$$ and node $$$B$$$ are in the same connected component in the $$$k$$$-th graph.
Find the number of nodes in the connected component of node $$$A$$$ in the $$$k$$$-th graph.

Constraints:

$$$ 1 \le N,Q \le 2 \cdot 10^5 $$$

$$$ 1 \le K \le M $$$

$$$ 1 \le A,B \le N $$$

We can’t do the DSU in the same way mentioned in the previous problem since it allows us to update only the last version of the DSU but here we may have to update an older version. To solve this, we can use a persistent segment tree for every array instead, leaf $$$i$$$ would store the value of parent and size for index $$$i$$$ and any non-leaf won’t store anything except the left and right child. When updating $$$parent_i$$$ and $$$size_i$$$ for some index $$$i$$$ for a particular DSU version $$$k$$$, we can just refer to the node $$$i$$$ in the segment tree with root $$$k$$$ and change the leaf values we need to and label the new root $$$M + 1$$$ which will correspond to the DSU version $$$M + 1$$$.

Time complexity: $$$O( N + Q \cdot {log}^2 N)$$$

Solution Code

#include <bits/stdc++.h>
using namespace std;
const int MAX_N = 2e5+5;
struct Node{
	int l = -1, r = -1, sz = -1, par = -1;
};
vector<Node>t;
void build(int i, int l, int r){
	if(l == r){
		t[i].par = l;
		t[i].sz = 1;
		return;
	}
	int mid = (l + r) / 2;
	t[i].l = t.size();
	t.emplace_back();
	build(t[i].l, l, mid);
	t[i].r = t.size();
	t.emplace_back();
	build(t[i].r, mid+1, r);
}
int nw(int j){
	int i = t.size();
	t.emplace_back();
	t[i].l = t[j].l;
	t[i].r = t[j].r;
	t[i].par = t[j].par;
	t[i].sz = t[j].sz;
	return i;
}
void upd(int i, int l, int r, int p, int v1, int v2){
	if(l == r){
		if(v1 != -1)t[i].par = v1;
		if(v2 != -1)t[i].sz = v2;
		return;
	}
	int mid = (l+r)/2;
	if(p <= mid){
		t[i].l = nw(t[i].l);
		upd(t[i].l, l, mid, p, v1, v2);
	}else{
		t[i].r = nw(t[i].r);
		upd(t[i].r, mid+1, r, p, v1, v2);
	}
}
int get_sz(int i, int l, int r, int p){
	if(l == r)return t[i].sz;
	int mid = (l + r) / 2;
	if(p <= mid)return get_sz(t[i].l, l,mid, p);
	return get_sz(t[i].r, mid+1, r, p);
}
int get_par(int i, int l, int r, int p){
	if(l == r)return t[i].par;
	int mid = (l + r) / 2;
	if(p <= mid)return get_par(t[i].l, l,mid, p);
	return get_par(t[i].r, mid+1, r, p);
}
int n, m, q, root[MAX_N];
int compress(int i, int j){
	int nxt = get_par(i, 1, n, j);
	if(nxt == j)return j;
	int root = compress(i, nxt);
	upd(i, 1, n, j, root, -1);
	return root;
}
int main(){
	ios::sync_with_stdio(0); cin.tie(0);
	cin >> n >> q;
	t.emplace_back();
	build(0, 1, n);
	while(q--){
		int type; cin >> type;
		if(type==1){
			int k, a, b; cin >> k >> a >> b;
			k--;
			m++;
			root[m]=nw(root[k]);
			compress(root[k], a);
			compress(root[k], b);
			a = get_par(root[k], 1, n, a);
			b = get_par(root[k], 1, n, b);
			if(a == b)continue;
			int szl = get_sz(root[k] ,1 ,n ,a), szr = get_sz(root[k], 1, n, b);
			if(szl > szr)swap(a, b);
			upd(root[m], 1, n, b, -1, szl+szr);
			upd(root[m], 1, n, a, b, -1);
		}else if(type==2){
			int k ,a ,b; cin >> k >> a >> b;
			k--;
			compress(root[k], a);
			compress(root[k], b);
			cout << (get_par(root[k], 1, n, a) == get_par(root[k], 1, n, b)? "YES" : "NO") << "\n";
		}else{
			int k, a; cin >> k >> a;
			k--;
			compress(root[k], a);
			cout << (get_sz(root[k], 1, n, get_par(root[k], 1, n, a))) << "\n";
		}
	}
	return 0;
}

Extra Problems:

https://qoj.ac/problem/1217

https://mirror.codeforces.com/gym/104468/problem/B

https://oj.uz/problem/view/APIO20_swap

Thanks for reading!

Comments (12)

Write comment?

Sacharlemagne

16 months ago, # |

← Rev. 2 →

Great read! It never crossed my mind to use a persistent seg-tree for dsu, the solutions here are quite good.
(By the way I got the fastest code by accident in that first cses question with a different approach which i believe works in O((m+q)*log(n)), if you want I can explain it here)

→ Reply

dualthread

16 months ago, # ^ |

Yes, please!

The idea is to use a regular dsu without any path compression and to save with each edge the time when it was added
So to check when 2 nodes were connected you need to check what is the maximum value in the path between the 2 in the "tree" of the dsu, because before it the path wouldn't exists.
Since the height of the tree is bounded in O(logn) you can just "pop up" from both nodes until you get to the lca, checking what is the maximum time value on the way.
So you get O(logn) per query, plus O(mlogn) to build the "dsu tree" as I like to call it.

sword060

+26

Theres a way to remove the binary search in the CSES problem!

We can instead use two pointers and keep moving nodes A and B till they are the same and we can keep track of the first time they need to be the same.

while they are not the same, if time_changed[a] < time_changed[b] we can move a to parent[a] and ans = time_changed[a] else we do it for b. if they dont become equal we return -1;

This is $$$O( N + (M+Q){log} N)$$$ !

Sample Code

#include <bits/stdc++.h>
using namespace std;
const int MAX_N = 2e5+5;
int n, m, q, parent[MAX_N], time_changed[MAX_N], sz[MAX_N];
int get_root(int node){
	if(parent[node] == node)
		return node;
	return get_root(parent[node]);
}
void union_sets(int a, int b, int time){
	a = get_root(a);
	b = get_root(b);
	if(a == b)
		return;
	if(sz[a] > sz[b])
		swap(a, b);
	sz[b] += sz[a];
	parent[a] = b;
	time_changed[a] = time;
}
int main(){
	ios::sync_with_stdio(0);cin.tie(0);
	cin >> n >> m >> q;
	for(int i = 1; i<=n; i++){
		parent[i] = i;
		sz[i] = 1;
	}
	for(int i = 1; i<=m; i++){
		int a, b; cin >> a >> b;
		union_sets(a, b, i);
	}
	while (q--) {
		int a, b; cin >> a >> b;
		int ans=0;
		while(a != b){
			if(parent[a] == a && parent[b] == b){
				ans = -1;
				break;
			}
			if(parent[a] != a && (time_changed[a] < time_changed[b] || parent[b] == b)){
				ans = time_changed[a];
				a = parent[a];
			}else{
				ans = time_changed[b];
				b = parent[b];
			}
		}
		cout << ans << "\n";
	}
	return 0;
}

with some fast input output and some optimizations it gets 0.07 seconds (fastest solution)

-8

Congrats, you've beat me!
My turn to optimize now

tfg

+16

It seems that you forgot that operations on a persistent segment tree cost O(log), resulting in O(log^2) operations for persistent DSU using them. I'm pretty sure you can't even claim O(logN * ackermann_inverse(N)) because amortization doesn't go well with persistent structures.

Thanks for your reply!

I Believe that in the worst case the tree is of height log(N) resulting in Log^2(N) operations including the segtree but this doesnt happen alot for the same reason the dsu is not N*Log(N) . (so I guess its average is (LogN * ackermann_inverse(N)) )

I think another way to look at it would be that its just a normal DSU with compression but there is a few segment tree calls for every function.

Please tell me if theres anything wrong or how you would find its complexity.

+21

Well you yourself said what's wrong. You "guessed" that the "average is (LogN * ackermann_inverse(N))". Amortization and persistent data structures don't go well together naturally. We can create a case where you have a tree in the dsu with depth O(log) then create many branches (for example doing operations that don't change that tree with depth O(log)) from that one and in each one you'll have to compress O(log) nodes resulting in O(log^2) cost per operation in average.

About your intuition of it "being just a normal DSU with compression" it's completely wrong. It's more like many normal DSUs with compression. Imagine if you used only path compression. Then, according to your wrong intuition, it should be O(log) seg tree operations in average, since that's the case when using path compression. But it's easy to see that we can create a case similar to what I've described above and make O(N) dsus that have some big path and then make one operation in each one, resulting in O(N^2) seg tree operations.

Yes you are right, I didn't think about this case before and that is why I found a wrong upper-bound. Thanks I will edit it now.

ivan100sic

+22

Continuing on the idea of problem 3, you can make any data structure persistent if you just use a persistent segtree as its underlying memory.

DrSwad

I might be missing something simple, but where is the $$$O(N \cdot {log} N)$$$ coming from in the complexity of the last problem? Doesn't the preprocessing cost only $$$O(N)$$$?

Thanks for the nice blog btw. Learned something new.

+13

oops my bad, I was probably confused with all the logs in the solution while calculating the time complexity. It is actually $$$O(N)$$$ not $$$O(N \cdot {log} N)$$$ preprocessing as we only build one segment tree. Thanks I will edit it now.

sword060's blog

Introduction:

Main Idea:

Problem 1 (Easy) :

Problem 2 (Medium) :

Problem 3 (Hard) :

Extra Problems: