Blog entries - Codeforces

#	User	Rating
1	Benq	3792
2	VivaciousAubergine	3647
3	jiangly	3631
4	Kevin114514	3574
5	maroonrk	3521
6	strapple	3515
7	Radewoosh	3461
8	tourist	3428
9	turmax	3378
10	Um_nik	3376

#	User	Contrib.
1	Qingyu	162
2	adamant	148
3	Um_nik	146
4	Dominater069	143
5	errorgorn	140
6	cry	138
7	Proof_by_QED	136
8	YuukiS	135
9	chromate00	134
10	soullless	133

The first part is: [PART1].

Tl;DR; is we're trying to hack solutions which use doubles to store integer values. In this part we will not hack the solution from the Part 1, but hopefully will get closer to it.

Instead hacking the initial solution to the original problem, let's try to formulate another, similar but simpler to hack problem.

Problem: "Very Annoying". Given two sorted arrays A, B of sizes $$$n_1, n_2$$$ where each element is integer and the following condition holds: $$$1 \leq \min(A), \min(B)$$$ and $$$\max(A), \max(B) \leq n_1 + n_2$$$.

The task: if $$$\prod_{a\in A} a \leq \prod_{b\in B} b$$$ print <=, otherwise print >.

Restriction: $$$n_1, n_2 \leq 10^6$$$.

Before going further, I invite you to try to solve this problem. If your solution is very short, please try this test (it actually contains many tests, so the first line contains the overall tests number). The correct answer is <= for all cases, so if you see >, your solution is WA. If you're curious how to generate such test, there will be a secion on adversarial test generation.

If you used big integers in python, please use this generator to generate a test. If your solution hanged on this, you get TL.

Generator

# run: python3 gen.py > test2.in
from random import randint

UB = 1000000

def gen_test(n):
    return sorted([randint(1, UB) for _ in range(n)])

if __name__ == "__main__":
    print(1)
    print(UB, UB)
    print(*gen_test(UB))
    print(*gen_test(UB))

As promised, the problem is very annoying, but if you managed to solve it by easy means, please share, I would be happy to know the trick (and maybe try to hack it as well)!

The annoying solution to the annoying problem is probably like that:

Annoying Solution

But of course I can be even more annoying and give a sequence generator, which depends on the prefix products, say

Very Very Annoying Problem

Let

$$$\text{PRNG(X)} = (b\cdot X + c)\bmod m$$$

be a pseudo-random number generator (b, c, m are in the input). Let $a_1$ and $$$b_1$$$ are given too, and let's define

$$$ \begin{cases} a_n = \text{PRNG}(a_{n-1} + [\prod_{i = 1}^{n - 1} a_i \leq \prod b_i]) \\ b_n = \text{PRNG}(b_{n-1} + [\prod_{i = 1}^{n - 1} a_i \leq \prod b_i]). \end{cases} $$$

Input: $$$b, c, n, m, a_1, b_1 \leq 10^6$$$,

Output: <= if $$$a_n \leq b_n$$$, and $$$a_n \gt b_n$$$ otherwise.

I am very curious if it's possible to solve this problem in reasonable time.

Adversarial test generation

Here I will discuss how to generate the first test set.

First let's set up the goal: we want to find two factored numbers $$$A = \prod a_i$$$, $$$B = \prod b_i$$$ so that the requirements satisfied:

Req1. all factors are bounded $$$|a_i|, |b_i| \leq n$$$

Req2. $$$A \lt B$$$

Req3. $$$\sum \log^{\text{c++}}a_i \gt \log^{\text{c++}}b_i$$$, where $$$\log^{\text{c++}}(\cdot)$$$ is c/c++/python (ultimately glibc) implementation of the $$$\log$$$

We have quite contradictory requirements. The Req3 implies that the numbers $$$A, B$$$ should be big enough, that we lose enough precision to output the wrong sign.

However, Req1 says all factors should be small. Today I learned, the numbers with largest prime factor not exceeding $$$B$$$ are called B-smooth numbers It turs out, they get increasingly rare. If I understood Wikipedia correctly, the number of $$$B$$$-smooth numbers below $$$x$$$ can be estimated as,

$$$ \leq \frac{x}{\left(\frac{\log x}{\log B}\right)!} + O\left(\frac{x}{\log B}\right), $$$

(which is not too practical as the big-O term dominates here, but from this we can say that the density of $$$B$$$-smooth numbers tend to zero as $$$x$$$ approaches the infinity)

Let's turn our attention to the Req3., we should treat $$$\log^{\text{c++}}$$$ as a black box, which does random rounding. To answer the question, which order of magnitude error we should expect, I chose a random numbers $$$u \ll v$$$ and evaluated $$$\log^{\text{c++}}(u) + \log^{\text{c++}}(u\cdot v^2) - 2\cdot \log^{\text{c++}}(u^2v^2)$$$

    long long u = 3, v = 321; 
    double one_way = std::log((double)(u)) + std::log((double)(u * v * v));
    double other_way = std::log((double)(u * v)) +  std::log((double)(u * v));
    std::cout << one_way - other_way << std::endl;

which gave me -1.77636e-15, aha, so we can expect $$$\approx 10^{-15} \approx 2^{-50}$$$ error at least (which should accumulate additively when we sum the logarithms).

So in order the error to play significant role, we need to ensure that $$$|\log(B) - \log(A)| \leq 2^{-50}$$$, then we can count on the fact that rounding and quanization is quite random, thus we will get a "bad" test with good probability! Further, for simplicity I assumed that $$$B = A + 1$$$, therefore we want

$$$ |\log\left(\frac{A + 1}{A}\right)| \leq 2^{-50} $$$

Using taylor approximation $$$\log(1 + x) = x + O(x^2)$$$ we estimate that

$$$ |\log\left(\frac{A + 1}{A}\right)| = \frac{1}{A} \leq 2^{-50}, $$$

or in other words the product should be at least $$$50$$$ bits long. It fits long long, yay!

After we have this, it was a matter of coding to

generate random 50 bit long numbers $$$A, A+1$$$
try to factor it. Factoring $$$2^{50}$$$ is a hard task, but there is a nice trick here. We are interested in numbers which have all factors not exceeding $$$n$$$. So first we can precompute all primes up until $$$n$$$, then for each random $$$A$$$ we only need to test these factors. If nothing works, then discard and try next. The class I used for factoring is here (and the full code will be at the very end):

Smooth Factorization

using ull = __uint128_t;
struct smooth_factor {
    ull ub; 
    std::vector<ull> factors;
    
    smooth_factor(ull ub): ub(ub) {
        std::vector<char> is_prime(ub + 1, 1);
        is_prime[0] = is_prime[1] = 0;
        for (ull p = 2; p * p <= ub; p++) {
            if (!is_prime[p]) continue;
            for (ull q = p * 2; q <= ub; q += p) is_prime[q] = 0;
        }
        for (ull p = 2; p <= ub; p++) {
            if (is_prime[p]) factors.push_back(p);
        }
    }
    
    std::variant<std::vector<ull>, std::monostate>
    factorize(ull n) {
        std::vector<ull> res;
        for (ull p: factors) {
            if (p * p > n) break;
            while (n % p == 0) {
                res.push_back(p);
                n /= p;
            }
        }
        if (n > 1) {
            if (n <= ub) {
                res.push_back(n);
            } else {
                return std::monostate{};
            }
        }
        return res;
    }
};

Finally, when I have two candidates $$$A, A+1$$$ ready (having factors less that $$$n$$$), we can just sum up their log and check that we found a "bad" example. It was exciting to get the first bad example!

FOUND!
1872579231250172
2 2 103 397 431 4217 6299 
3 3 3 79 83 617 3167 5413

I kept playing with it, and managed to reduce the $$$n$$$ to $$$10^4$$$, so even for the factors as small as $$$10^4$$$ the logarithms can give the wrong inequality sign!

Here is the full code to generate many such instances, which I ultimatley used in the first test suit.

Full Code

#include <iostream>
#include <variant>
#include <vector>
#include <random>
#include <iomanip>
#include <algorithm>

using ull = __uint128_t;

#define TESTGEN
#define DEBUG


std::ostream& operator<<(std::ostream& os, ull n) {
    const uint64_t cut = 10000000000000000000ULL; 
    if (n == 0) return os << "0";
    if (n < cut) {
        return os << (uint64_t)n;
    }
    std::string s;
    while (n > 0) {
        uint64_t chunk = (uint64_t)(n % cut);
        n /= cut;        
        std::string chunk_str = std::to_string(chunk);
        if (n > 0) {
            std::string padding(19 - chunk_str.length(), '0');
            s = padding + chunk_str + s;
        } else {
            s = chunk_str + s;
        }
    }    
    return os << s;
}


using ull = __uint128_t;
struct smooth_factor {
    ull ub; 
    std::vector<ull> factors;
    
    smooth_factor(ull ub): ub(ub) {
        std::vector<char> is_prime(ub + 1, 1);
        is_prime[0] = is_prime[1] = 0;
        for (ull p = 2; p * p <= ub; p++) {
            if (!is_prime[p]) continue;
            for (ull q = p * 2; q <= ub; q += p) is_prime[q] = 0;
        }
        for (ull p = 2; p <= ub; p++) {
            if (is_prime[p]) factors.push_back(p);
        }
    }
    
    std::variant<std::vector<ull>, std::monostate>
    factorize(ull n) {
        std::vector<ull> res;
        for (ull p: factors) {
            if (p * p > n) break;
            while (n % p == 0) {
                res.push_back(p);
                n /= p;
            }
        }
        if (n > 1) {
            if (n <= ub) {
                res.push_back(n);
            } else {
                return std::monostate{};
            }
        }
        return res;
    }
};


std::random_device rd;
std::mt19937_64 gen(rd());
ull rand128() {
    uint64_t high = gen();
    uint64_t low = gen();
    return ((ull)high << 64) | low;
}

ull random(int k) {
    ull min_val = (ull)1 << k;
    ull max_val = (ull)1 << (k + 1);
    ull range = max_val - min_val;
    ull offset = rand128() % (range + 1);    
    return min_val + offset;
}


int main() {

    long long u = 3, v = 321; 
    double one_way = std::log((double)(u)) + std::log((double)(u * v * v));
    double other_way = std::log((double)(u * v)) +  std::log((double)(u * v));
    std::cout << one_way - other_way << std::endl;



    std::cout 
        << std::setprecision(1000) 
        << std::log((double)6.0) + std::log((double)7.0) - std::log((double)(42)) 
        << std::endl;

    const int UB = 10'000 - 1;
    const int RANGE_K = 50;

    smooth_factor sf(UB);

    int iter = 0; 
    int print_each = 1000'000;

    
    
    while (true) {
        iter++;
        if (iter % print_each == 0) {
            std::cerr << "Iteration: " << iter << "\n";
        }

        ull n = random(RANGE_K); 
        auto res1 = sf.factorize(n);
        auto res2 = sf.factorize(n + 1);

        auto res1_good = std::holds_alternative<std::vector<ull>>(res1);
        auto res2_good = std::holds_alternative<std::vector<ull>>(res2);
        if (!res1_good || !res2_good) {
            continue;
        }



        // std::cout << "pair found: " << n << " and " << (n + 1) << "\n";

        auto factors1 = std::get<std::vector<ull>>(res1);
        auto factors2 = std::get<std::vector<ull>>(res2);


        double log_prod1 = 0.0;
        for (ull p: factors1) log_prod1 += std::log((double)p);
            
        double log_prod2 = 0.0;
        for (ull p: factors2) log_prod2 += std::log((double)p);


        if (log_prod1 <= log_prod2) {
            continue;
        }


        std::cout << factors1.size() << " " << factors2.size() << "\n";
        for (ull f: factors1) {
            std::cout << f << " ";
        }        
        std::cout << "\n";
        for (ull f: factors2) {
            std::cout << f << " ";
        }
        std::cout << "\n";

        #ifdef DEBUG        

        std::cout << "FOUND!" << std::endl;

        std::cout << n << "\n";
        for (ull f: factors1) {
            std::cout << f << " ";
        }
        std::cout << "\n";
        for (ull f: factors2) {
            std::cout << f << " ";
        }
        std::cout << "\n";
    
        #endif      
    }


}

I am not sure if I will be able to leverage these to build an actual hack of the solution from part 1, but this already seems enough to share!

Full text and comments »

IAmTiredAndSleepy's blog

Adversarial test generation