Adding K random edges to a graph

#	User	Rating
1	tourist	3690
2	jiangly	3647
3	Benq	3581
4	orzdevinwang	3570
5	Geothermal	3569
5	cnnfls_csy	3569
7	Radewoosh	3509
8	ecnerwala	3486
9	jqdai0815	3474
10	gyh20	3447

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	162
4	TheScrasse	159
5	nor	157
6	maroonrk	156
7	-is-this-fft-	152
8	orz	146
9	Petr	145
9	pajenegod	145

Suppose we're given a graph with $$$N$$$ vertices and $$$M \space (\le \frac{N \times (N - 1)}{2})$$$ undirected edges, vertices are labelled from $$$0$$$ to $$$N - 1$$$.

We've to add $$$K \space (\le \frac{N \times (N - 1)}{2} - M)$$$ random undirected edges to this graph such that no edge is repeated (including the ones we added).

Goal is to deterministically minimize the number of calls made to $$$rand()$$$ function, only $$$K$$$ calls ideally, choice of edges should be provably random, while keeping rest of the algorithm not too inefficient, for $$$N, M, K \le 10^6$$$.

I feel this problem can be called "pick K random elements without repetition from complement set" where $$${ (i, j) \mid 0 \leq i \leq N \text{ and } 0 \leq j \leq N }$$$ is the universal set and we have a set of $$$M$$$ pairs initially.

I am unable to think any provably random and efficient ways to accomplish this, for example I could pick a random vertex, then pick $$$rand(0, N - degree)$$$ and do some mapping for second vertex, which is $$$O(degree)$$$ probably and can become inefficient.

I would like to know any ideas to accomplish this, thanks.

vector<pair<int, int>> Generate( int n, int k, vector<pair<int, int>>& edges, // expect all edges (a, b) to have b < a mt19937_64& rng ) { // Generate k random indices from 1 to n^2 - m // We will afterwards sort them and make sure that they're distinct after // sorting by adding the number of smaller elements. ll sz = 1LL * n * (n - 1) / 2 - edges.size(); vector<ll> indices(k); for (int i = 0; i < k; ++i) indices[i] = rng() % (sz--); sort(indices.begin(), indices.end()); sort(edges.begin(), edges.end()); // We skip existing edges via two pointers technique. // This part might be tricky to understand; it's best to think about: // 1. We leverage the bijection (a, b) <-> idx by the relationship // idx = a * (a - 1) / 2 + b // 2. The `edges` array comes in (a, b) form, and the `indices` // array comes in idx form // Once you understand that, the "weird code" is just converting between // the two representations on the fly. vector<pair<int, int>> result; int ptr = 0, bias = 0, ea = 0; for (auto idx : indices) { idx += (bias++); while (ptr < (int)edges.size()) { // Check if (a, b) <= idx; if so, increase bias, otherwise break. auto [a, b] = edges[ptr]; assert(b < a); ll jdx = 1LL * a * (a - 1) / 2 + b; if (jdx <= idx) ++ptr, ++bias, ++idx; else break; } // (ea, eb) is the edge to add now. while (1LL * (ea + 1) * ea / 2 <= idx) ++ea; int eb = idx - ea; assert(0 <= eb && eb < ea); result.emplace_back(ea, eb); } return result; }

Comments (6)

Write comment?

fonmagnus

4 months ago, # |

Will this work ?

Store the adjacency list of each node in a set
Start from rhe node with the least degree to the most degree (denote this node as $$$u$$$) and do the following :
Iterate $$$v$$$ from $$$1$$$ to $$$N$$$. While edge $$$(u,v)$$$ is not added, add it to the graph (to check it, the complexity is $$$O(\text{log }N)$$$

Agree this approach will produce a less random. But how would you define "choice of edges should be provably random"?

→ Reply

bihariforces

4 months ago, # ^ |

+18

Let $$$X = \frac{N \times (N - 1)}{2} - M$$$, there are $$$\binom{X}{K}$$$ ways to choose K random edges, so chances of choosing a particular set should be $$$\frac{1}{ways}$$$, though I don't know how we could calculate this for some algorithm, will try to implement this method, thanks.

nifeshe

← Rev. 3 →

You can generate $$$1$$$ edge at a time. To do that generate a random number between $$$1$$$ and $$$\frac{n \cdot (n - 1)}{2} - m$$$, get an edge by index with a binary search and then update everything with some data structures. One operation can be maintained in $$$O(\log n)$$$, however it has a bad constant factor.

Edit: actually, I can only see how to do in $$$O(\log ^2 n)$$$ because we need to get $$$k$$$-th element that is absent in some adjacency list, and I know how to dynamically do that only with ordered sets. There is probably a much better way.

You can achieve $$$O(\log n)$$$ complexity with that method if you use dynamic (implicit) segment trees instead of ordered sets, however I have a feeling that it would work even slower, and it uses $$$O(m \log n)$$$ memory.

vgtcross

← Rev. 4 →

+26

I'm quite sure that you can find the $$$k$$$-th absent element in $$$O(\log N)$$$ by using treaps. The constant factor is bad, but memory is $$$O(M + K)$$$.

bicsi

← Rev. 2 →

+19

Compiling what others have said, this is I think the best you can expect:

Generate a random sorted subset of size $$$K$$$ of numbers from $$$1$$$ to $$$N^2-M$$$ in $$$O(K \log K)$$$ with low constant.
Do two pointers to skip the $$$M$$$ edges while adjusting in linear time.

I've also included some code, as I think in this case it's more suggestive. Please note that I haven't tested the code whatsoever and there might be bugs, but it highlights the general idea.

https://gist.github.com/theabbie/0be86edab481af581c450a4cfeef6d69

Implemented this random graph generator based on ideas suggested.

bihariforces's blog