Can anyone provide a proper explanation of the A2 problem of Round 2 Facebook Hackercup editorial provided here: https://www.facebook.com/codingcompetitions/hacker-cup/2022/round-2/problems/A2/solution
I could not understand the Video editorial as well.
If you have other solution, kindly explain it. I want to learn and upsolve it. It is indeed a good question. Providing code/implementation the explanation is highly appreciated. Thanks!
Let me give it a try. I'll assume you have solved A1.
Let's say you want to check if
A[l..r]
is perfectly balanced. In A1 there were only $$$26$$$ different kinds of element, so we could just compare each element's frequency inA[l..m]
andA[m+1..r]
. But in A2 there are $$$10^6$$$ different kinds of element, so we can no longer use this method.Instead of comparing each element's frequency, one might come up with this idea: "what about comparing the sum of elements in
A[l..m]
andA[m+1..r]
?". For example, ifA[l..r]
was[1, 2, 3, 2, 1, 3]
, left sum is $$$1+2+3=6$$$ and right sum is $$$2+1+3=6$$$. Both sums are equal, and[1, 2, 3, 2, 1, 3]
is indeed perfectly balanced! But if you are given[1, 3, 2, 2]
, both sums are equal to $$$4$$$, but it's not perfectly balanced. Here you can't distinguish between $$$\{1, 3\}$$$ and $$$\{2, 2\}$$$.This situation is very analogous to string hashing. If we have a string $$$S=s_1s_2\cdots s_n$$$ and use hash function as $$$h(S) = s_1+s_2+\cdots+s_n \mod P$$$, same problem will arise. Instead we can use hash function like $$$h(S) = s_1r^{n-1}+s_2r^{n-2}+\cdots+s_nr^0 \mod P$$$ and lower chances of collisions.
We can apply this hashing method to our problem(obviously you can use different hash functions). If we have some subarray
A[l..r]
and number of occurrences of $$$1$$$ in this subarray is $$$a_1$$$, number of occurrences of $$$2$$$ is $$$a_2$$$, ..., number of occurrences of $$$10^6$$$ is $$$a_{10^6}$$$, let's hash this subarray into $$$h(A[l..r]) = a_{10^6}r^{10^6-1}+\cdots+a_2r^1+a_1r^0 \mod P$$$.(You can choose $$$r$$$ and $$$P$$$ as you like, maybe two big prime numbers)Here, basically we are compressing all frequency information in a subarray into one single number. So to check if a subarray is perfectly balanced, we can just compare left hash value and right hash value.
You might ask: "how to calculate the hash value fast?". Let's replace each $$$1$$$s in $$$A[]$$$ with $$$r^0\mod P$$$, $$$2$$$s with $$$r^1 \mod P$$$, ..., $$$10^6$$$s with $$$r^{10^6-1} \mod P$$$. Let's call this new array $$$B$$$. Then $$$h(A[l..r]) = B_l+B_{l+1}+\cdots+B_r \mod P$$$. Thus, if we use segment tree or fenwick tree, we can calculate the hash value in $$$O(log\hphantom{\text{_}}n)$$$.
Now, to the main problem: how to check if
A[l..r]
is almost perfectly balanced? Let's sayA[l..r]
was almost perfectly balanced, and one extra element(say, $$$k$$$) belonged to the left half of the subarray. If we calculate $$$h(A[l..m])$$$ and $$$h(A[m+1 ..r])$$$, we can see $$$h(A[l..m])=h(A[m+1 ..r])+r^{k-1} \mod P$$$, and we get $$$h(A[l..m])-h(A[m+1 ..r]) \mod P=r^{k-1} \mod P$$$.So, we can calculate $$$h(A[l..m])-h(A[m+1 ..r]) \mod P$$$ and check if this value exists in $$$\{r^0\mod P, r^1 \mod P, \cdots, r^{10^6-1} \mod P\}$$$. If it exists, then
A[l..r]
is almost perfectly balanced. If it doesn't, do the same check for $$$h(A[m ..r])-h(A[l..m-1]) \mod P$$$.Hope this helps!
I implemented it as you said, but one problem i am facing is that for different r and P prime values i am getting different answers. If for one set it is passing 10 test cases, it is failing for others. What would be your suggestion? https://pastebin.com/ZUP9cjcD Basically i am facing hashing collision
I see some mistakes in your code. Here are what I found:
Line 71
: you should run the for loop up to $$$10^6-1$$$Line 88, 89
: instead ofval
, you should usepow(mod, val-1)
Line 101, 102
: you should do subtraction just like you did inLine 103
To avoid hash collisions, you can have multiple $$$(r, P)$$$ pairs and check if each pair gives the same result.
I solved $$$A2$$$ with double multiplicative hashing.
Single hashing gave me FST (due to collisions in hash) which leads to disqualify for Round 3 :(
Here is my submission.
Java solution and editorial to https://www.facebook.com/codingcompetitions/hacker-cup/2022/round-2/problems/A2/solution
Finally solved it and got AC. Thanks to taran_1407 for explaining his approach (I used his approach to solve this) and clearing all my doubts with patience. Thanks to hihihizaru for explaining the editorial approach and an example on creating our own hash function. The editorial approach/one given by you does not work in java because there would be hash collisions while calculating the right_sum-left_sum of the given range and also there would be collisions while using Long in Segment/BIT.
Note/Knowlege share: "Random().nextLong() is almost equivalent to mt19937 as I know" — Taranpreet Singh
Approach: Same as explained below by hihihizaru but instead of creating hash like that we can use Random.nextInt() method. Please see the submission here: https://pastebin.com/FSg9Yq83 . "The Implementation of Random is pretty well distributed. I used higher number of random number mapping as instead of going with 64-bit range, My numbers were distributed in range of int, 32 bit numbers, leading to collision probability across 10^6 queries to be 1-(1-10^(-3))^(10^6) ~ , which could realistically fail. So I used 10 randomizations to achieve better success probability." — Taranpreet Singh
Note: Here Yes, by hash collision, I mean the same one as mentioned in the editorial "garbage value mapping back to h(A)".
If I had used 64 bit range, I probably would have needed only 2 hash functions, or even 1 might have sufficed. If i used Random().nextLong() instead of randInt(), it's almost equivalent to mt19937 as I know.
My idea behind using 32 bit range was that I didn't want values in segment tree/BIT to overflow 64 bit range and potentially cause collision. If you notice, any value in my segment tree wouldnt exceed N*2^32 < 10^15, so overflow won't happen.
Ofcourse, you can simply let overflow happen and use 64 bit numbers as used in editorial, but I preferred mine.
Finally, Thanks to SecondThread. Although he could not help me in providing code editorial or help me resolve my issue in solving this in Java. He explained about Random numbers and using seed to get same set of random numbers every time and how people use seed to not make others hack their code, see the discussion here: https://mirror.codeforces.com/blog/entry/107359
Note: I am sharing this journey to help people who could not solve this/ inspire that we shouldn't leave a problem unsolved instead learn and solved it. It teaches a lot. You may be a Newbie but if you are still a newbie after couple of years then it's your mistake
Great to see you post learnings.
A couple of relevant points
1. The approach mentioned in editorial can (and I believe it would) work for java as well. It was my decision to choose to work with 32 bit numbers instead of 64 bit ones. Then editorial solution could work as well (probably requiring handling negative hashes unlike C++ which have unsigned long long int). Anyone feel free to correct me if 2^64 bit is usable as mod in in java without added complexity.
2. "Random().nextLong() is almost equivalent to mt19937 as I know" This may or may not be true from a theoretical conext, but I mean it from point of usage of these two approaches of generating random numbers in competitions. Both of these generate sufficiently well distributed numbers, not facing issues like C++ rand(). A lot of research on RNGs exists, especially on mt19937, which can be referred if interested in theory.
3. 10 is not a magic number here, but only my guess. I may or may not have needed less/high number of hash functions if I did do probability analysis during round. It's a tradeoff between time taken to run your solution vs success probability, considering my solution had to apply each step 10 times whenever has was involved.