kelzzin2_fan's blog

By kelzzin2_fan, 11 days ago, In English

Hello Codeforces community, this is my first educational blog on a topic, hopefully some will find it useful despite it not being difficult in nature.

Note: This is not meant to be an introduction to Disjoint Sets Union, if needed, please refer to this tutorial before reading the following.

This is a tutorial based on this article. Here, I'll include some new ideas and implementation details of mine as well. Thanks DavidTan and nickyfoo for proof-reading and also kelzzin2 for exisiting.

DSU Recap

For those who need a quick recap, DSU are meant to support 2 main operations:

  • $$$union(a,b)$$$: merge the 2 sets that contain $$$a$$$ and $$$b$$$.
  • $$$find(a)$$$: return some identifier that represents the set that $$$a$$$ is in. Note that $$$find(a) = find(b)$$$ iff $$$a$$$ and $$$b$$$ are in the same set.

How this is done is by representing the sets as forests. If the tree that $$$a$$$ belongs to is the same tree that $$$b$$$ belongs to, they are in the same set. We check this if the root of the tree they belong to are the same.

$$$find$$$ then can simply be done by returning the identifier of the root of the tree. $$$union$$$ is done by connecting one of the root to another root (This will be important later).

An example of DSU:

Union Find Image

$$$union(1,5)$$$:

Union Find Merge Image

For more details regarding the optimisations (compare set size before $$$union$$$, path compression etc.), please read the other tutorials.

Extra Operation: Move

Suppose we now want to support a new operation:

  • $$$move(a,b)$$$: move element $$$a$$$ to the set that contains $$$b$$$

An observation here is that if $$$a$$$ is a leaf, this is simple to do. We can simply just assign $$$t[a] = find(b)$$$. This is because no other elements point to $$$a$$$, therefore no changes are require for all other nodes. If the DSU maintains some other info, we can perform a $$$find(a)$$$ operation and update the required info.

$$$move(3,5)$$$:

Union Find Move Image

However, the harder case is when $$$a$$$ is not a leaf. This is because everything pointing to $$$a$$$ needs to be modified, leading to a high time complexity. Notice, however, we can ensure this will not happen by doing the following:

  1. Initialize our DSU with $$$2N$$$ elements instead.
  2. $$$\forall i = 0,..,N-1$$$, perform $$$union(i,N+i)$$$ with $$$N+i$$$ as the root. In other words, set $$$i$$$ to point to $$$N+i$$$.

Notice, $$$\forall i = 0,..,N-1$$$, $$$i$$$ is a leaf and $$$N+i$$$ is the root. With the previous observation that $$$union(a,b)$$$ only connects one root to another, we ensure that all $$$i = 0,..,N-1$$$ are leaves no matter the operation.

Below is an example (for $$$N$$$ = 4):

Union Find Move Non-Leaf Image

Of course, there are other implementations possible, such as having another vector<int> keep track of $$$move$$$ operations and such, but this implementation requires minimal changes as it only requires a change on how we initialise our DSU.

Here, we have a practice problem for implementation purposes.

Sample Submission

Extra Operation: Erase

Suppose we now want to support another operation:

  • $$$erase(a)$$$: Remove element $$$a$$$ from its current set. (Future operations will not be performed if one of the arguments include an $$$erased$$$ element)

Doing this is fairly simple, as we can have a vector<bool> keep track of deleted nodes, and update the size info accordingly when deleted. However, if we already have to support the $$$move$$$ operation, we can use the same array to represent $$$delete$$$, this is done by setting node $$$i$$$ to point to itself. Notice, with how we initialise the DSU, $$$\forall i = 0,..,N-1$$$, node $$$i$$$ can never point to itself, unless done through the delete operation.

Notice another easy way to implement $$$erase$$$ is just to have $$$2N+1$$$ nodes instead. We can have the $$$2N$$$-th node represent the set of $$$erased$$$ nodes, then move the nodes to the $$$erased$$$ set when they are $$$erased$$$. This also allows $$$erased$$$ elements to be moved back into a set. (Thanks to nickyfoo for pointing this out)

What if we allow an element to be reused for future operations? (i.e. after $$$erase(a)$$$, it will be removed from the current set and become a set that only contains $$$a$$$)

Answer

Implementation Detail

Finally, let's try to also keep track of the size of the set using the same vector. This is a somewhat popular implementation, but for those who do not know, I'll mention it here. In the regular implementation, we have the root point to itself. However, we can signify the root by using negative numbers instead. (i.e $$$t[i] < 0$$$ iff $$$t[i]$$$ is the root). Notice, this lets us keep track of the size of the set by using $$$-size$$$ at the root.

Below is another example (using the new implementation) with the same operations as the previous example: Union Find with -1

Notice there is a possibility of some root having a size of 0 (assuming that we don't keep track of the size of the extra nodes). However, those nodes that may have this issue always range from $$$N$$$ to $$$2N-1$$$, so they would just no longer be reachable anymore (nothing points to it).

TLDR Sample Implementation

struct DSU {
    std::vector<int> _t;
    DSU(int N): _t(2*N,-1) { 
        std::iota(_t.begin(),_t.begin()+N, N);
    }
    int find(int x) { return _t[x] < 0 ? x : _t[x] = find(_t[x]); }
    bool join(int x, int y) {
        auto fx = find(x), fy = find(y);
        if (is_del(x) || is_del(y) || fx == fy) return false;
        if (_t[fx] > _t[fy]) std::swap(fx,fy);
        _t[fx] += _t[fy]; _t[fy] = fx; // Ensure we do not count the extra nodes
        return true;
    }
    bool move(int x, int y) {
        // this uses the point to itself method.
        auto fx = find(x), fy = find(y);
        if (is_del(x) || is_del(y) || fx == fy) return false;
        _t[fx]++; _t[fy]--; // Note fx, fy cannot be from [0..N-1]
        _t[x] = fy;
        return true;
    }
    bool erase(int x) {
        auto fx = find(x);
        if (is_del(x)) return false;
        _t[fx]++; _t[x] = -1;
        return true;
    }
    bool is_del(int x) { return _t[x] == -1; }
    int sz(int x) { return -_t[find(x)]; }
};
  • Vote: I like it
  • +149
  • Vote: I do not like it

»
8 days ago, # |
  Vote: I like it 0 Vote: I do not like it

wow

»
8 days ago, # |
  Vote: I like it 0 Vote: I do not like it

Quick nice

»
8 days ago, # |
  Vote: I like it 0 Vote: I do not like it

learned something new of dsu.I solved a similar problem like move a to b using another vector track but that above trick is nice. Thank You.

»
8 days ago, # |
  Vote: I like it 0 Vote: I do not like it

This is so beautiful, bump

»
7 days ago, # |
  Vote: I like it 0 Vote: I do not like it

we can also do online connectivity by representing edges as nodes right ? I taught we needed a link cut tree for that.

  • »
    »
    7 days ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I'm not too sure about this, as I haven't studied up on Link Cut Trees. But how would you check the connectivity of 2 nodes in a graph if you represent the edges as elements in the DSU?

    • »
      »
      »
      7 days ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      like if there is an edge between a and b , we create a new node in the dsu which is connected to both a and b , now a and b are connected , if we want to remove this edge we simply remove the node that we created that connects a and b

      • »
        »
        »
        »
        7 days ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        I think it doesn't actually work because of the following:

        1. Simple reason is that we do not support $$$erase$$$ for nodes not numbered $$$0$$$ to $$$N-1$$$. Remember $$$erase$$$ essentially must only be performed for leaf nodes, and that are the nodes from $$$0$$$ to $$$N-1$$$.

        2. Even if we somehow are able to do it, when we create a new node and perform $$$union(node_{new}, a)$$$ and $$$union(node_{new}, b)$$$, there has to be a singular identifier $$$x$$$ representing such a set (i.e. $$$find(a) = find(b) = x$$$).

        3. Note that we define $$$erase(a)$$$ as removing $$$a$$$ from such a set. However, that implies $$$find(a) = find(b) = x$$$ should still hold after the $$$erase$$$.

        Let me know if I missed anything.