bihariforces's blog

By bihariforces, history, 4 months ago, In English

Suppose we're given a graph with $$$N$$$ vertices and $$$M \space (\le \frac{N \times (N - 1)}{2})$$$ undirected edges, vertices are labelled from $$$0$$$ to $$$N - 1$$$.

We've to add $$$K \space (\le \frac{N \times (N - 1)}{2} - M)$$$ random undirected edges to this graph such that no edge is repeated (including the ones we added).

Goal is to deterministically minimize the number of calls made to $$$rand()$$$ function, only $$$K$$$ calls ideally, choice of edges should be provably random, while keeping rest of the algorithm not too inefficient, for $$$N, M, K \le 10^6$$$.

I feel this problem can be called "pick K random elements without repetition from complement set" where $$${ (i, j) \mid 0 \leq i \leq N \text{ and } 0 \leq j \leq N }$$$ is the universal set and we have a set of $$$M$$$ pairs initially.

I am unable to think any provably random and efficient ways to accomplish this, for example I could pick a random vertex, then pick $$$rand(0, N - degree)$$$ and do some mapping for second vertex, which is $$$O(degree)$$$ probably and can become inefficient.

I would like to know any ideas to accomplish this, thanks.

  • Vote: I like it
  • +20
  • Vote: I do not like it

»
4 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Will this work ?

  1. Store the adjacency list of each node in a set
  2. Start from rhe node with the least degree to the most degree (denote this node as $$$u$$$) and do the following :
  3. Iterate $$$v$$$ from $$$1$$$ to $$$N$$$. While edge $$$(u,v)$$$ is not added, add it to the graph (to check it, the complexity is $$$O(\text{log }N)$$$

Agree this approach will produce a less random. But how would you define "choice of edges should be provably random"?

  • »
    »
    4 months ago, # ^ |
      Vote: I like it +18 Vote: I do not like it

    Let $$$X = \frac{N \times (N - 1)}{2} - M$$$, there are $$$\binom{X}{K}$$$ ways to choose K random edges, so chances of choosing a particular set should be $$$\frac{1}{ways}$$$, though I don't know how we could calculate this for some algorithm, will try to implement this method, thanks.

»
4 months ago, # |
Rev. 3   Vote: I like it +18 Vote: I do not like it

You can generate $$$1$$$ edge at a time. To do that generate a random number between $$$1$$$ and $$$\frac{n \cdot (n - 1)}{2} - m$$$, get an edge by index with a binary search and then update everything with some data structures. One operation can be maintained in $$$O(\log n)$$$, however it has a bad constant factor.

Edit: actually, I can only see how to do in $$$O(\log ^2 n)$$$ because we need to get $$$k$$$-th element that is absent in some adjacency list, and I know how to dynamically do that only with ordered sets. There is probably a much better way.

You can achieve $$$O(\log n)$$$ complexity with that method if you use dynamic (implicit) segment trees instead of ordered sets, however I have a feeling that it would work even slower, and it uses $$$O(m \log n)$$$ memory.

  • »
    »
    4 months ago, # ^ |
    Rev. 4   Vote: I like it +26 Vote: I do not like it

    I'm quite sure that you can find the $$$k$$$-th absent element in $$$O(\log N)$$$ by using treaps. The constant factor is bad, but memory is $$$O(M + K)$$$.

»
4 months ago, # |
Rev. 2   Vote: I like it +19 Vote: I do not like it

Compiling what others have said, this is I think the best you can expect:

  1. Generate a random sorted subset of size $$$K$$$ of numbers from $$$1$$$ to $$$N^2-M$$$ in $$$O(K \log K)$$$ with low constant.
  2. Do two pointers to skip the $$$M$$$ edges while adjusting in linear time.

I've also included some code, as I think in this case it's more suggestive. Please note that I haven't tested the code whatsoever and there might be bugs, but it highlights the general idea.

vector<pair<int, int>> Generate(
    int n, int k, 
    vector<pair<int, int>>& edges, // expect all edges (a, b) to have b < a 
    mt19937_64& rng
) {
  // Generate k random indices from 1 to n^2 - m
  // We will afterwards sort them and make sure that they're distinct after
  // sorting by adding the number of smaller elements.
  ll sz = 1LL * n * (n - 1) / 2 - edges.size();
  vector<ll> indices(k);
  for (int i = 0; i < k; ++i)
    indices[i] = rng() % (sz--);
  sort(indices.begin(), indices.end());
  sort(edges.begin(), edges.end());
  
  // We skip existing edges via two pointers technique.
  // This part might be tricky to understand; it's best to think about:
  // 1. We leverage the bijection (a, b) <-> idx by the relationship
  //    idx = a * (a - 1) / 2 + b
  // 2. The `edges` array comes in (a, b) form, and the `indices` 
  //    array comes in idx form
  // Once you understand that, the "weird code" is just converting between
  // the two representations on the fly.
  vector<pair<int, int>> result;
  int ptr = 0, bias = 0, ea = 0;
  for (auto idx : indices) {
    idx += (bias++);
    while (ptr < (int)edges.size()) {
      // Check if (a, b) <= idx; if so, increase bias, otherwise break.
      auto [a, b] = edges[ptr];
      assert(b < a);
      ll jdx = 1LL * a * (a - 1) / 2 + b;
      if (jdx <= idx) ++ptr, ++bias, ++idx;
      else break;
    }
    // (ea, eb) is the edge to add now.
    while (1LL * (ea + 1) * ea / 2 <= idx) ++ea;
    int eb = idx - ea;
    assert(0 <= eb && eb < ea);
    result.emplace_back(ea, eb);
  }

  return result;
}
»
4 months ago, # |
  Vote: I like it 0 Vote: I do not like it

https://gist.github.com/theabbie/0be86edab481af581c450a4cfeef6d69

Implemented this random graph generator based on ideas suggested.