Imagine you need to iterate through many random permutations of a string and perform some quick check for each one. The check is so fast that the bottleneck in your solution becomes the shuffling. For simplicity, in the snippets, I've sped up the check to a no-op and added printing the final string.
Implementation with random_shuffle
#include <bits/stdc++.h>
using namespace std;
#define ITER 10000
int main() {
string s; cin >> s;
for (int i = 0; i < ITER; ++i) {
random_shuffle(s.begin(), s.end());
// do_something(s);
}
cout << s << '\n';
}
We'll benchmark this on a random [a-zA-Z0-9] string of length $$$2 \cdot 10^5$$$.
Compile the code with g++ 14.2.1 using the standard -O2 flag common in most judging systems. On my laptop, the user CPU time was 26.78 seconds.
Implementation with shuffle
Perhaps you've read the well-known article about random or noticed the compiler warning. Indeed, no one uses random_shuffle anymore, as there's a modern replacement. How will it perform in the benchmark?
#include <bits/stdc++.h>
using namespace std;
#define ITER 10000
int main() {
string s; cin >> s;
mt19937 rng(42); // reproducibility
for (int i = 0; i < ITER; ++i) {
shuffle(s.begin(), s.end(), rng);
// do_something(s);
}
cout << s << '\n';
}
The user CPU runtime for this implementation under the same conditions was $$$7.73$$$ seconds. Cool! In this note, we won't dive into implementation details or analyze the reason for this speedup.
Implementation with strfry
Let's turn to C's legacy. The standard library sometimes contains useful functions, like strtok, which have no modern C++ equivalents. And if you look under _GNU_SOURCE, you can find even more surprising things.
Suddenly, [glibc](https://www.gnu.org/software/libc enters the stage. For our task, strfry can help. This function takes a null-terminated string and shuffles it. The global state of the PRNG remains unchanged, and the permutation will vary between runs. Under the hood, the function uses getrandom in the standard linear swap algorithm. By the way, according to the C++11 standard, our string is null-terminated.
#include <bits/stdc++.h>
using namespace std;
#define ITER 10000
int main() {
string s; cin >> s;
for (int i = 0; i < ITER; ++i) {
strfry(s.data());
// do_something(s);
}
cout << s << '\n';
}
The user CPU time under the same conditions was $$$5.54$$$ seconds. Hooray, we've found the winner!




