How to Generate Random Connected Graphs

→ Pay attention

Before contest
Codeforces Round 995 (Div. 3)
07:08:13
Register now »

→ Streams

Codeforces Round 995 Solution Discussion

By aryanc403

Before stream 09:28:13

View all →

→ Top rated

#	User	Rating
1	tourist	3985
2	jiangly	3814
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3517
7	Radewoosh	3410
8	hos.lyric	3399
9	ecnerwala	3392
9	Um_nik	3392

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	cry	169
2	maomao90	162
2	Um_nik	162
4	atcoder_official	161
5	djm03178	158
6	-is-this-fft-	157
7	adamant	155
8	awoo	154
8	Dominater069	154
10	luogu_official	150

View all →

→ Find user

→ Recent actions

Detailed →

upobir's blog

How to Generate Random Connected Graphs

By upobir, history, 5 years ago, In English

As the title says, I wanna generate random graphs that are connected. Now there are some details herr, obviously I want the graph to have $$$n$$$ nodes which is predetermined. Something else that arises in competitive programming is that the graph should have $$$m$$$ edges(also predetermined). Now how does one do that in an intelligent way.

Now I have done a little bit of googling, and it seems there are no 'good' way of getting a perfectly 'random' such graph i.e. pick one uniformly from all connected graphs with n nodes and m edges. But that's okay, approximately random is also good.

One way I have considered is to make a spanning tree first, then add extra edges. But this has horrible bias for certain graphs. For example for $$$n$$$ node $$$n$$$ edge graph, a cycle graph $$$(1-2-3-...-n-1)$$$ is $$$\frac{n}{3}$$$ times more likely than path $$$(1-2-3-4-...-n)$$$ + edge $$$(1-3)$$$. So what process can be a good choice?

#graph, generated test, random

upobir
5 years ago
12

Comments (12)

Write comment?

DeadlyCritic

5 years ago, # |

+11

All you need is here.

→ Reply

upobir

5 years ago, # ^ |

If I understand it correctly, this library uses the exact method that I talked about in the post. Which I am not saying is bad. However I was hoping for something with less bias towards certain graphs.

→ Reply

DeadlyCritic

5 years ago, # ^ |

Please let us know if you found a better way, i am really into the topic.

→ Reply

DeadlyCritic

5 years ago, # |

← Rev. 2 →

-8

One way to construct a random small graph with $$$m$$$ edges and $$$n$$$ vertices is to add all the edges in a vector, then shuffle them and pick the first $$$m$$$, if you want it to be connected, then first pick edges $$$1-2,\;2-3\dots,\;(n-1)-n$$$, then add all the edges except them in a vector, shuffle it and pick first $$$m-n+1$$$ edges. And in the end shuffle the vertices, i.e. choose a random permutation $$$p$$$ of vertices, and map each vertex $$$u$$$ to $$$p_u$$$.

$$$UPD$$$ i meant to first construct a spanning tree, not a path, sorry for the mistake.

→ Reply

upobir

5 years ago, # ^ |

+18

This will generate connected m-edge graph that will alawys have a hamiltonian path. Which I think is worse than the spanning tree first method.

→ Reply

DeadlyCritic

5 years ago, # ^ |

+10

You are right sorry for my mistake :(, i meant to construct a random spanning tree, and so the same idea as you said.

→ Reply

geniucos

5 years ago, # ^ |

+16

Wouldn't this only generate graphs which have a hamiltonian path? (and therefore clearly not uniformly among the set of all possible graphs of m edges)

→ Reply

DeadlyCritic

5 years ago, # ^ |

← Rev. 2 →

Ye sorry for that :(, but it still constructs good random graphs if its not necessary to be connected , yes?

→ Reply

upobir

5 years ago, # |

Auto comment: topic has been updated by upobir (previous revision, new revision, compare).

→ Reply

seiko.iwasawa

3 years ago, # |

I have an algorithm, but it works $$$O(n^4)$$$.

Let's build a complete graph for $$$n$$$ vertices. After that, let's erase random edges one by one such that the graph remains connected. When the number of edges is exactly $$$m$$$, we get our graph. Note that for each connected graph with $$$m$$$ edges, we have $$$(\frac{n \cdot (n-1)}{2}-m)!$$$ options to get it. So, all of them have equal probability.

This algorithm can be improved. If $$$n \le 10^4$$$ and we build random graph with $$$8 \cdot n$$$ edges, it will be connected with a high probability. So, if $$$m \le 8 \cdot n$$$, we can build this random graph instead of the complete graph. Then the algorithm will work $$$O(64 \cdot n^2)$$$. If $$$m \gt 8 \cdot n$$$, we can build just a random graph with $$$m$$$ edges.

Here is my implementation, but is has a problem if $$$m$$$ is very big (then my algorithm will not find a graph with $$$m$$$ edges).

→ Reply

clyring

3 years ago, # ^ |

← Rev. 2 →

+21

I don't believe this works, because although each connected graph gets $$$(\frac{n \cdot (n-1)}{2}-m)!$$$ options for reaching it, these options do not have equal probability. This is because at each step you can only remove edges that will not disconnect the working graph, and the number of such non-bridges is dependent on which edges you have already removed. The smallest counterexample has $$$n = 4$$$, $$$m = 3$$$, where 4 out of the 16 connected labeled graphs are claws, but your algorithm chooses one with probability $$$4/15$$$.

→ Reply

clyring

3 years ago, # |

← Rev. 2 →

The process in seiko.iwasawa's comment is basically all but the last $$$m - n + 1$$$ steps of the reverse-delete algorithm with random edge weights, and so it can be simulated in $$$O(n \cdot \alpha(n) + m)$$$ expected time using Kruskal's algorithm. For large $$$n$$$, the number of edges that need to be tried before Kruskal's algorithm finds a spanning tree will be of order $$$n \log n$$$ with high probability. However, most of this comes from repeatedly trying internal edges within some large component of size nearly $$$n$$$, and it is possible to sample from the other edges in $$$O(1)$$$ since there can only be one such large component.

I've also come up with another idea that, while still not (yet) practical for $$$m$$$ very close to $$$n$$$, seems serviceable for at least part of that realm between "generate a random spanning tree" and "generate a random graph and hope it's connected." It is to allocate more vertices than will be eventually used, to allow a connected component to grow to size $$$n$$$ with fewer than the normal $$$n \log{n}$$$ or so edges. To keep the analysis and proof of correctness as simple as possible, I consider only the connected component containing the vertex $$$0$$$.

Consider adding edges one at a time to an initially empty graph with $$$s$$$ vertices labeled $$$0$$$ through $$$s-1$$$. If at any point, the connected component containing the vertex $$$0$$$ has $$$n$$$ vertices and $$$m$$$ edges, we relabel its vertices as $$$0$$$ through $$$n-1$$$ and stop. The process fails if this never happens. Given an arbitrary connected labeled graph $$$G$$$ on those $$$n$$$ vertices with $$$m$$$ edges, its probability of being reached by this process is exactly the following quantity:

$$$ \displaystyle \frac{ \displaystyle \binom{s-1}{n-1}}{\displaystyle \binom{\binom{n}{2} + n(s-n)}{m}} $$$

The numerator is the number of sets $$$K$$$ which can be the connected component of size $$$n$$$ containing $$$0$$$. The denominator comes from that for any such set $$$K$$$, the graph $$$G$$$ will be reached via $$$K$$$ iff the (exactly one) (uniformly random) configuration of the $$$\binom{n}{2} + n(s-n)$$$ edges incident to at least one vertex in $$$K$$$ with $$$m$$$ included is the set corresponding internally to $$$G$$$ after relabeling and which includes no boundary edge. Although I don't have a great estimate on the actual asymptotic behavior of its expected runtime, by choosing $$$s$$$ to maximize a related quantity, this process seems to succeed at $$$n = 2 \cdot 10^5$$$, $$$m = 3 \cdot 10^5$$$ on average within around 100 attempts, whereas this size would be unassailable to the other exact methods I have considered.

Here's a quick-and-dirty implementation of the idea:

Spoiler

#include <algorithm>
#include <cmath>
#include <cassert>
#include <iostream>
#include <random>
#include <vector>

constexpr bool verboseOutput = false;
constexpr bool printGraph = true;

using LL = long long;
using VI = std::vector<int>;

struct DSU {
  VI par;
  VI sizes;
  VI edges;

  int n;

  DSU(int vertices)
    :par { VI(vertices) },
     sizes { VI(vertices, 1) },
     edges { VI(vertices, 0) },
     n { vertices }
  {
    for (int i = 0; i < n; ++i)
      par[i] = i;
  }

  int find(int x) {
    while (x != par[x])
      x = par[x] = par[par[x]];
    return x;
  }

  bool unite(int x, int y) {
    x = find(x);
    y = find(y);
    if (sizes[x] < sizes[y])
      std::swap(x, y);
    edges[x] += 1;
    if (x == y)
      return false;
    par[y] = x;
    sizes[x] += sizes[y];
    edges[x] += edges[y];
    return true;
  }
};

int main() {
  using namespace std;
  ios::sync_with_stdio(false);
  cin.tie(nullptr);

  int n;
  LL m;
  int maxTries, seed;
  cin >> n >> m >> maxTries >> seed;

  assert(m >= n - 1);
  assert(m <= n * (n - 1LL) / 2);

  auto succRate = [&](int s) -> long double {
    LL relEdges = n * (n - 1LL) / 2 + n * LL(s - n);
    return  lgammal(s-1) + lgammal(relEdges - m + 1)
      - logl(s) - lgammal(s - n + 1) - lgammal(relEdges + 1);
  };

  int s = n;
  while (succRate(s+1) > succRate(s))
    ++s;

  mt19937 src { seed };
  using UI = uniform_int_distribution<int>;
  UI rvertex { 0, s - 1 };

  if (verboseOutput)
    cout << "s = " << s << endl;

  for (int currTry = 1; currTry <= maxTries; ++currTry) {
    if (verboseOutput)
      cout << "try #" << currTry << endl;
    DSU dsu { s };
    vector<VI> adj = vector<VI>(s);
    auto tryInsEdge = [&]() -> void {
      int x = rvertex(src);
      int y = rvertex(src);
      if (x == y)
        return;
      bool edgeExists = false;
      for (int z : adj[x])
        if (y == z)
          edgeExists = true;
      if (edgeExists)
        return;
      adj[x].push_back(y);
      adj[y].push_back(x);
      dsu.unite(x, y);
    };

    int root = dsu.find(0);
    while (dsu.sizes[root] <= n && dsu.edges[root] < m) {
      tryInsEdge();
      root = dsu.find(0);
    }
    if (dsu.sizes[root] != n || dsu.edges[root] != m)
      continue;

    struct Edge {
      int x;
      int y;
    };
    vector<int> ps;
    ps.reserve(n);
    vector<Edge> es;
    es.reserve(m);
    for (int i = 0; i < s; ++i)
      if (dsu.find(i) == root) {
        ps.push_back(i);
        for (int j : adj[i])
          if (j > i)
            es.push_back(Edge { i, j });
      }

    assert(int(ps.size()) == n);
    assert(int(es.size()) == m);

    shuffle(es.begin(), es.end(), src);
    // originally there was some edge-removal stuff as well,
    // but it doesn't seem actually useful, so it's commented out
    //es.resize(m);
    auto compress = [&](int p) -> int {
      return int(lower_bound(ps.begin(), ps.end(), p) - ps.begin());
    };

    //dsu = DSU(n);
    for (Edge& e : es) {
      e.x = compress(e.x);
      e.y = compress(e.y);
      //dsu.unite(e.x, e.y);
    }
    //if (dsu.sizes[dsu.find(0)] == n)
    {
      if (printGraph) {
        cout << n << '\n' << m << '\n';
        for (Edge e : es)
          cout << e.x << ' ' << e.y << '\n';
      }
    }

    return 0;
  }

  cerr << "max tries exceeded\n";

  return 1;
}

(And finally I repeat the well-known observation that large random combinatorial objects are often very statistically similar to each other. That can make them easier as objects of study, but can also be a disadvantage: If the goal is to generate strong test data it is often wise to embrace the different biases of different processes.)

→ Reply