Блог пользователя Sugar_fan

Автор Sugar_fan, 3 недели назад, По-английски

Let us start with 2014H - Robin Hood Archery. A common solution assigns each value a random integer in $$$[0,L)$$$ and XORs the values in the interval. If every value appears an even number of times, the result is $$$0$$$. Otherwise, the result is $$$0$$$ with probability at most $$$1/L$$$.

Inspired by the hack used on 282284449 with xorshift, I found that many other submissions can also be hacked. The pitfalls of XOR hashing with mt19937_64 do not seem to be widely known, so I wrote this blog.

If we view both the internal state and the outputs as vectors over $$$\mathbb{F}_2$$$, then the $$$i$$$-th output of mt19937_64 with initial state $$$x$$$ can be written as $$$o_i = BA^i x$$$, where $$$A$$$ is the state transition matrix and $$$B$$$ is the output matrix over $$$\mathbb{F}_2$$$.

Thus, to make the XOR zero for every seed, it is enough to find a nonzero polynomial $$$p(t)=\sum_i a_i t^i$$$ such that $$$p(A)=0$$$. Then

$$$ \bigoplus_i a_i o_i = \bigoplus_i a_i BA^i x = B\left(\bigoplus_i a_i A^i\right)x = Bp(A)x = 0. $$$

Such a polynomial can be the minimal polynomial of $$$A$$$, whose degree is $$$19937$$$ for mt19937_64. By observing the least significant bit of the first $$$2 \times 19937$$$ outputs, we can recover the recurrence of this bit sequence using Berlekamp–Massey. This recurrence divides the minimal polynomial. Since mt19937_64 is designed to have maximal period, the corresponding polynomial is primitive, hence irreducible. Thus any nonzero recovered recurrence must be the full minimal polynomial.

The following is the test case generator.

Test Case Generator

Using uniform_int_distribution(0, L) incorrectly can still cause the same issue. For example, the implementation is unsafe on common standard-library implementations when $$$L = 2^n-1$$$, because the distribution may degenerate into taking the low $$$n$$$ bits of rng(), which is still linear over $$$\mathbb{F}_2$$$.

Обсуждение Codeforces Round 974 (Div. 3)
  • Проголосовать: нравится
  • +395
  • Проголосовать: не нравится

»
3 недели назад, скрыть # |
 
Проголосовать: нравится +27 Проголосовать: не нравится

Does it means we can hack a fixed seeded solution or even randomized seed? I can't understand the math part ><

»
3 недели назад, скрыть # |
 
Проголосовать: нравится 0 Проголосовать: не нравится

So is this an issue only for XOR hashing or for most randomized algorithms which use the Mersenne Twister? I guess the solution would be to use other PRNGs like splitmix for most purposes? I'm not that good at math :/

»
3 недели назад, скрыть # |
 
Проголосовать: нравится +40 Проголосовать: не нравится

Time to switch to splitmix64

»
3 недели назад, скрыть # |
 
Проголосовать: нравится +11 Проголосовать: не нравится

Just use aesctr. fast and (mostly) secure https://github.com/lemire/testingRNG/tree/master#visual-summary

simplified version (1937 ms)

(In comparison, std::mt19937_64 uses 2906 ms).

»
3 недели назад, скрыть # |
 
Проголосовать: нравится +5 Проголосовать: не нравится

i just learned xor hashing bro why :(

  • »
    »
    3 недели назад, скрыть # ^ |
     
    Проголосовать: нравится +2 Проголосовать: не нравится

    It is easy to avoid being hacked, just make some non-bitwise operation on rng() randomly.

    Like this:

    std::mt19937_64 rng(std::chrono::steady_clock::now().time_since_epoch().count());
    for (int i = 0; i < n; i++) {
        a[i] = rng() / 3;
    }
    

    Do not just divide or multiply it by $$$2^k$$$ or it will just degenerate into taking a subset of rng(). (which is still hackable)

»
3 недели назад, скрыть # |
 
Проголосовать: нравится 0 Проголосовать: не нравится

Will shuffling the indexes in the beginning solve it?

»
3 недели назад, скрыть # |
 
Проголосовать: нравится 0 Проголосовать: не нравится

So what is the “correct” xor hashing?