Блог пользователя IAmTiredAndSleepy

Автор IAmTiredAndSleepy, история, 2 месяца назад, По-английски

The first part is: [PART1].

Tl;DR; is we're trying to hack solutions which use doubles to store integer values. In this part we will not hack the solution from the Part 1, but hopefully will get closer to it.

Instead hacking the initial solution to the original problem, let's try to formulate another, similar but simpler to hack problem.

Problem: "Very Annoying". Given two sorted arrays A, B of sizes $$$n_1, n_2$$$ where each element is integer and the following condition holds: $$$1 \leq \min(A), \min(B)$$$ and $$$\max(A), \max(B) \leq n_1 + n_2$$$.

The task: if $$$\prod_{a\in A} a \leq \prod_{b\in B} b$$$ print <=, otherwise print >.

Restriction: $$$n_1, n_2 \leq 10^6$$$.

Before going further, I invite you to try to solve this problem. If your solution is very short, please try this test (it actually contains many tests, so the first line contains the overall tests number). The correct answer is <= for all cases, so if you see >, your solution is WA. If you're curious how to generate such test, there will be a secion on adversarial test generation.

If you used big integers in python, please use this generator to generate a test. If your solution hanged on this, you get TL.

Generator

As promised, the problem is very annoying, but if you managed to solve it by easy means, please share, I would be happy to know the trick (and maybe try to hack it as well)!

The annoying solution to the annoying problem is probably like that:

Annoying Solution

But of course I can be even more annoying and give a sequence generator, which depends on the prefix products, say

Very Very Annoying Problem

Let

$$$\text{PRNG(X)} = (b\cdot X + c)\bmod m$$$

be a pseudo-random number generator (b, c, m are in the input). Let $a_1$ and $$$b_1$$$ are given too, and let's define

$$$ \begin{cases} a_n = \text{PRNG}(a_{n-1} + [\prod_{i = 1}^{n - 1} a_i \leq \prod b_i]) \\ b_n = \text{PRNG}(b_{n-1} + [\prod_{i = 1}^{n - 1} a_i \leq \prod b_i]). \end{cases} $$$

Input: $$$b, c, n, m, a_1, b_1 \leq 10^6$$$,

Output: <= if $$$a_n \leq b_n$$$, and $$$a_n \gt b_n$$$ otherwise.

I am very curious if it's possible to solve this problem in reasonable time.

Adversarial test generation

Here I will discuss how to generate the first test set.

First let's set up the goal: we want to find two factored numbers $$$A = \prod a_i$$$, $$$B = \prod b_i$$$ so that the requirements satisfied:

Req1. all factors are bounded $$$|a_i|, |b_i| \leq n$$$

Req2. $$$A \lt B$$$

Req3. $$$\sum \log^{\text{c++}}a_i \gt \log^{\text{c++}}b_i$$$, where $$$\log^{\text{c++}}(\cdot)$$$ is c/c++/python (ultimately glibc) implementation of the $$$\log$$$

We have quite contradictory requirements. The Req3 implies that the numbers $$$A, B$$$ should be big enough, that we lose enough precision to output the wrong sign.

However, Req1 says all factors should be small. Today I learned, the numbers with largest prime factor not exceeding $$$B$$$ are called B-smooth numbers It turs out, they get increasingly rare. If I understood Wikipedia correctly, the number of $$$B$$$-smooth numbers below $$$x$$$ can be estimated as,

$$$ \leq \frac{x}{\left(\frac{\log x}{\log B}\right)!} + O\left(\frac{x}{\log B}\right), $$$

(which is not too practical as the big-O term dominates here, but from this we can say that the density of $$$B$$$-smooth numbers tend to zero as $$$x$$$ approaches the infinity)

Let's turn our attention to the Req3., we should treat $$$\log^{\text{c++}}$$$ as a black box, which does random rounding. To answer the question, which order of magnitude error we should expect, I chose a random numbers $$$u \ll v$$$ and evaluated $$$\log^{\text{c++}}(u) + \log^{\text{c++}}(u\cdot v^2) - 2\cdot \log^{\text{c++}}(u^2v^2)$$$

    long long u = 3, v = 321; 
    double one_way = std::log((double)(u)) + std::log((double)(u * v * v));
    double other_way = std::log((double)(u * v)) +  std::log((double)(u * v));
    std::cout << one_way - other_way << std::endl;

which gave me -1.77636e-15, aha, so we can expect $$$\approx 10^{-15} \approx 2^{-50}$$$ error at least (which should accumulate additively when we sum the logarithms).

So in order the error to play significant role, we need to ensure that $$$|\log(B) - \log(A)| \leq 2^{-50}$$$, then we can count on the fact that rounding and quanization is quite random, thus we will get a "bad" test with good probability! Further, for simplicity I assumed that $$$B = A + 1$$$, therefore we want

$$$ |\log\left(\frac{A + 1}{A}\right)| \leq 2^{-50} $$$

Using taylor approximation $$$\log(1 + x) = x + O(x^2)$$$ we estimate that

$$$ |\log\left(\frac{A + 1}{A}\right)| = \frac{1}{A} \leq 2^{-50}, $$$

or in other words the product should be at least $$$50$$$ bits long. It fits long long, yay!

After we have this, it was a matter of coding to

  1. generate random 50 bit long numbers $$$A, A+1$$$

  2. try to factor it. Factoring $$$2^{50}$$$ is a hard task, but there is a nice trick here. We are interested in numbers which have all factors not exceeding $$$n$$$. So first we can precompute all primes up until $$$n$$$, then for each random $$$A$$$ we only need to test these factors. If nothing works, then discard and try next. The class I used for factoring is here (and the full code will be at the very end):

Smooth Factorization

Finally, when I have two candidates $$$A, A+1$$$ ready (having factors less that $$$n$$$), we can just sum up their log and check that we found a "bad" example. It was exciting to get the first bad example!

FOUND!
1872579231250172
2 2 103 397 431 4217 6299 
3 3 3 79 83 617 3167 5413 

I kept playing with it, and managed to reduce the $$$n$$$ to $$$10^4$$$, so even for the factors as small as $$$10^4$$$ the logarithms can give the wrong inequality sign!

Here is the full code to generate many such instances, which I ultimatley used in the first test suit.

Full Code

I am not sure if I will be able to leverage these to build an actual hack of the solution from part 1, but this already seems enough to share!

Полный текст и комментарии »

  • Проголосовать: нравится
  • +77
  • Проголосовать: не нравится

Автор IAmTiredAndSleepy, история, 2 месяца назад, По-английски

Sorry for the semi-click bait title. Today I spent solving this problem 2189D2 - Little String (Hard Version), the summary of the problem (after simplifications) as follows:

There is a reference strnig $$$w$$$ of length $$$n$$$ consisting of $$$0, 1, ?$$$. Consider a set of possible products

$$$S = \left\{\prod\limits_{i = 1}^n a_i \text{ s.t. }\begin{cases}w_i = 1\rightarrow a_i = 2 \\ w_i = 0\rightarrow a_i = i \\ w_i = ?\rightarrow a_i \in \{2, i\}\end{cases} \right\}$$$

The task is to find the minimum $$$x \in S$$$ such that $$$c \not\mid x$$$.

My solution is slightly diviates from the tutorial. The solution sketch is: the condition $$$c \not\mid x$$$ is equivalent $$$\text{gcd}(x, c) \mod c \neq 0$$$, or in other words $$$\gcd(x, c)$$$ is a proper divisor of $$$c$$$.

Therefore the idea is to have a dynamic programming with the state space consisting of $$$(i, d)$$$, where i is a prefix of the string, and $$$d \mid c$$$. The dp computes the optimal answer $$$x$$$ which

  • corresponds to the prefix w[:i]

  • $$$\gcd(x, c) = d$$$

The number of dp states is the number of divisors $$$d(c)$$$ times $$$n$$$ which seems fit the constraints.

The dp transition is as follow: suppose we encountered $$$w_i = ?$$$, then we iterate over $$$d \mid c$$$, multiplying the stored optimum $$$x$$$ with either $$$2$$$ or $$$i$$$ and update the dp states $$$(\gcd(x \cdot 2, c) \bmod c, i + 1)$$$ and $$$(\gcd(x \cdot i, c) \bmod c, i + 1)$$$

The problem start here: in order to keep each dp state $$$O(1)$$$ we cannot store the optimum $$$x$$$ for each state explicitley, and the original problem asked to produce the answer modulo some $$$\text{MOD}$$$. However, we cannot store $$$x \bmod \text{MOD}$$$ as well, as for the dp transition we need to have the < comparison between the numbers.

Therefore my approach was to represent each number $$$x$$$ as a triplet:

  • $$$\gcd(x, c) \bmod c$$$

  • $$$x \bmod \text{MOD}$$$

  • $$$\log(x)$$$

After many skill issues I got an AC 364853210 (sorry I didn't clean the solution, it contains the maspy's template codes for fast IO and number theory, my code is at the bottom).

But I believe this can be hacked. The problem that the property $$$\log(a) \cdot \log(b) = \log(a \cdot b)$$$ holds only approximately, so to hack the solution I would need to find two sets of numbers: $$$a_1, a_2, \dots, a_n$$$ and $$$b_1, \dots, b_n$$$ such that

  • $$$\prod a_i \lt \prod b_i$$$

  • $$$\sum(\log^{\text{c++}}a_i) \gt \sum(\log^{\text{c++}}b_i)$$$, where $$$\log^{\text{c++}}(\cdot)$$$ is the C++ implementation of logarithm.

  • the program is forced at some execution path to compare these two.

The preliminary estimate, that worst case I store up untill $$$\log(n!) \sim n \log n $$$, therefore the doubles worst case are in the $$$10^5$$$ range. I am not sure, is it sufficient to lose precision sufficiently to have the comparison wrong.

If the answer is yes, then I would be happy to know if it's possible store the values as 2 or 3 doubles. For the case of regular integer numbers, it's well establised to use bigint, so I'm wondering is there are "big doubles". And how the operations over these "big doubles" should be implemented. Thank you!

Полный текст и комментарии »

  • Проголосовать: нравится
  • +12
  • Проголосовать: не нравится