Gornak40's blog

By Gornak40, history, 9 months ago, translation, In English

Imagine you need to iterate through many random permutations of a string and perform some quick check for each one. The check is so fast that the bottleneck in your solution becomes the shuffling. For simplicity, in the snippets, I've sped up the check to a no-op and added printing the final string.

Implementation with random_shuffle

#include <bits/stdc++.h>
using namespace std;

#define ITER 10000

int main() {
    string s; cin >> s;
    for (int i = 0; i < ITER; ++i) {
        random_shuffle(s.begin(), s.end());
        // do_something(s);
    }
    cout << s << '\n';
}

We'll benchmark this on a random [a-zA-Z0-9] string of length $$$2 \cdot 10^5$$$.

How to generate such a string on Unix?

Compile the code with g++ 14.2.1 using the standard -O2 flag common in most judging systems. On my laptop, the user CPU time was $$$26.78$$$ seconds.

Implementation with shuffle

Perhaps you've read the well-known article about random or noticed the compiler warning. Indeed, no one uses random_shuffle anymore, as there's a modern replacement. How will it perform in the benchmark?

#include <bits/stdc++.h>
using namespace std;

#define ITER 10000

int main() {
    string s; cin >> s;
    mt19937 rng(42); // reproducibility
    for (int i = 0; i < ITER; ++i) {
        shuffle(s.begin(), s.end(), rng);
        // do_something(s);
    }
    cout << s << '\n';
}

The user CPU time for this implementation under the same conditions was $$$7.73$$$ seconds. Cool! In this note, we won't dive into implementation details or analyze the reason for this speedup.

Implementation with strfry

Let's turn to C's legacy. The standard library contains some useful functions, like strtok, which have no modern C++ equivalents. And if you look under _GNU_SOURCE, you can find even more surprising things.

Suddenly, glibc enters the stage. For our task, strfry can help. This function takes a null-terminated string and shuffles it. The global state of the PRNG remains unchanged, and the permutation will vary between runs. Under the hood, the function uses getrandom in the standard linear swap algorithm. By the way, according to the C++11 standard, our string is null-terminated.

#include <bits/stdc++.h>
using namespace std;

#define ITER 10000

int main() {
    string s; cin >> s;
    for (int i = 0; i < ITER; ++i) {
        strfry(s.data());
        // do_something(s);
    }
    cout << s << '\n';
}

The user CPU time under the same conditions was $$$5.54$$$ seconds. Congratulations, we've found the winner!

Full text and comments »

  • Vote: I like it
  • +36
  • Vote: I do not like it

By Gornak40, history, 4 years ago, translation, In English

Compiler GCC provides the ability to use assembler inserts. This can be useful, for example, for multiplying two 64-bit numbers by a 64-bit module.

The fact is that multiplying two 64-bit registers, the processor stores the result in a pair of registers rdx (upper part) and rax (lower part). Division works in a similar way: the divisible is taken from the registers rdx and rax, after which the quotient is stored in rax, and the remainder is stored in rdx.

Using this knowledge, you can implement an analog of the following function:

inline long long mul(long long a, long long b) {
	return (__int128)a * b % 1000000014018503;
}

In this way:

inline long long mul(long long a, long long b) {
	long long res;
	asm(
		"mov %1, %%rax\n"
		"mov %2, %%rbx\n"
		"imul %%rbx\n"
		"mov $1000000014018503, %%rbx\n"
		"idiv %%rbx\n"
		"mov %%rdx, %0\n"
		:"=res"(res)
		:"a"(a), "b"(b)
	);
	return res;
}

We indicate the use of variables res for writing, a and b for reading. They accordingly receive designations %0, %1, %2. Operations are written using the standard AT&T syntax.

Now you can write hashes using a 64-bit module, which is equivalent to using a pair using a 32-bit module, without using __int128.

Full text and comments »

  • Vote: I like it
  • +42
  • Vote: I do not like it