not_amir's blog

By not_amir, 6 months ago, In English

This blog is about maintaining mst with online edge insertions (or msf — minimum spanning forest to be exact) by using a data structure I came up with in $$$O(\log n)$$$ for edge insertion. although this problem can solved quite easily for anyone who knows LCT (link-cut tree), coding LCT is not really practical and has really bad constants. The data structure I present has very good constants because as you'll see all it does is manipulation on arrays. For obvious reasons that will be shown later I have called this data structure "weighted DSU".

Warmup problem

The problem is stated as follows: you are given an undirected graph where each edge has a time $$$t_i$$$. you are asked to answer queries of the form "what is the smallest time t such that you can get between u and v with all edges in time $$$<= t$$$". This problem is analogous to calculating the msf of the graph where the weights are the times and being able to answer what is the maximum on the simple path between u and v. This problem can be solved a lot of different ways, I will present the dsu solution:
first we want to sort the edges by time/weight, than we will add the edges in a dsu like way but without path compression. Additionally we we will store on each edge in the dsu the weight of the edges it represents. To be specific we will store 3 arrays: parent, weight and size. when connecting two representatives u to v we update $$$parent[u] = v, weight[u] = w, size[v] += size[u]$$$. now we can find the representative of v at time w. One way to answer the queries will be to binary search on the answer, which will be $$$O(\log^2 n)$$$. A faster way is to essentially find the LCA of u and v in the dsu tree. this can be done by moving up with the vertex with current smallest out-edge weight. This will be $$$O(\log n)$$$.

The data structure

We will use the same idea from the warmup problem and try to turn it to a dynamic one. When adding an edge we want to think what would have happened if we had added the edges in increasing order, this will result in the following almost complete algorithm:
when adding an edge with weight w between u and v we can get the root of u at time w, get the root of v at time w and connect the one with smaller subtree size to the bigger. assume we connect u to v, this means we discarded the out-edge of u. To correct this we will recursively add the edge between u and parent[u] with weight weight[u] (adding the edge we deleted).

This algorithm has 3 problems:
1. The underlying increasing nature of the weight of the edges going up was what our original DS was relying on, now we may destroy it.
2. the array size is dynamic so we cant rely on it like we did in the warmup problem.
3. what if the vertices are already connected?

The first problem has a straightforward solution — when trying to access the parent of a vertex, jump over all the edges that have weight less than the edge to the parent and connect the vertex to the new parent. The get root function will now look like this:

int getRoot(int v, int w = INF - 1) {
	while (weight[v] <= w) {
		while (weight[parent[v]] <= weight[v])
			parent[v] = parent[parent[v]];
		v = parent[v];
	}
	return v;
}

The second problem can be solved in an elegant and simple way — instead of connecting by size connect by a random index! To be specific we create an array index that is initialized with random values. We connect the vertex with smaller index to the one with the bigger index. (this union technique is known to be with same expected complexity as union by size which is optimal complexity)
To solve the third problem we can use mst properties. when adding an edge {w, u, v} we find the "main" edge along the path between u and v — the edge with maximum weight (or an edge if there are multiple). if it is weight is bigger than w, than we delete it and add the edge with the algorithm described above. Putting all of this together we get the following DS:

code

Time complexity

While I do not have a formal proof for the time complexity, I can provide a rough sketch/intuition for it:
what our DS does is it tries to replicate the DS from the warmup problem. It does that pretty good but has minor defects as described in problem 2. My claim is that this defects constitute at most the depth of the tree in the warmup problem, which will still result in max depth of $$$O(\log n)$$$. Testing that I had done highly support this claim. If anyone has a formal proof please share it!
another advantage of this DS are the constants — while in theory LCT has $$$O(\log n)$$$ complexity, it has really bad constants and this data structure easily defeats it often time being 3-6 times faster.

Additional operations

Deleting maximal edge:
Assume that all weights in the graph are distinct. We will notice that the maximal edge in each connected component is necessary — it is a bottleneck edge. Therefore we can delete the maximal edge and it will break the component into two components. Deleting the edge wont create a problem because no other edge addition used it to "jump" up. The easiest way to deal with the problem if the weights are not distinct is to just make them distinct.

Saving connected component size:
For some problems we may want to maintain the size of each connected component in the msf, even after edge deletions. We can do this buy maintaining the subtree size of each node in the tree that is saved in our DS. that is, we maintain an array size such that for each node v size[v] equals the number of nodes are there such that v is a parent of them. The easiest and most elegant way I found doing it was to change the size array as if the the edges in the parent path from u and v were disconnected and when going through an edge add the size back.

code

Other Queries:
We want to be able to answer queries of the form: "how many nodes can be reached from v with edge weights $$$<= w$$$". To do so we go to getRoot(v, w) and want to get the sum of the sizes of all subtrees where the edge to them has weight $$$<= w$$$. We do so by maintaining a treap in each vertex that stores the subtree sizes, to get the number we spilt the treap and get the sum of the suffix of subtree sizes. this has time complexity of $$$O(\log^2 n)$$$, but due to the randomness of our DS and the distribution of vertex degrees in our tree, we may get a running time that is better than this. Testing that I had done support this claim. If anyone has a worst case example that has expected running time of $$$O(\log^2 n)$$$ please share it.

Problems

Offline Dynamic Connectivity

solution

codeforces 1423 — problem H

solution

BOI 2020 Joker

solution

codeforces 76 — problem A

solution

codeforces 603 — problem E

solution

Comment if you can think of more uses of this DS

Full text and comments »

  • Vote: I like it
  • +49
  • Vote: I do not like it