Hello, Codeforces!
"The Struggle" (Codeforces Gym 103329F) is a problem I authored which appeared in the HDU Multi-university Training, the Ptz Summer Camp and the Open Cup. Despite appearing in contests where there are a total of ~1300 three people teams, I know of few (possibly no more than 5) people who have learned and independently implemented the solution.
The problem is pretty much fun and the solution is quite easy to implement (actual implementation < 2kb). From this blog you will easily learn how the algorithm works and how to implement the solution effortlessly. You will become able to solve this OpenCup problem that few people have solved, up till today!
The problem statement is very simple: Given an ellipse $$$E$$$ that is contained in $$$(0,4 \times 10^6) \times (0,4 \times 10^6)$$$, calculate the value $$$\sum_{(x, y) \in E}(x \oplus y)^{33} x^{-2} y^{-1} \mod 10^9+7$$$ over all integer points $$$(x,y)$$$.
While the solution does not seem to be obvious, we shall consider a easier case: how should we compute $$$\sum_{x = 0}^{2^n-1}\sum_{y = 0}^{2^n-1} (x \oplus y)^{33} x^{-2} y^{-1} \mod 10^9+7$$$? i.e. If the aria is a square $$$[0,2^n-1] \times [0,2^n-1]$$$, how to calculate the value? (For our purposes we shall consider $$$0^{-2} = 0^{-3} \equiv 0 \mod 10^9+7$$$.)
This is quite simple! This can be done in $$$n \log n$$$ time, using an algorithm called "Fast Walsh Hadamard Transforms" or FWHT or FWT or fast xor convolution. The convolution basically calculates $$$c_i = \sum_{x = 0}^{2^n-1}\sum_{y = 0}^{2^n-1} [x \oplus y = i]a_xb_y$$$. If we set $$$a_i = i^{-2}$$$ and $$$b_i = i^{-3}$$$, we can calculate $$$\sum_{i = 0}^{2^n-1} c_i \times i^{33}$$$ and this will be the answer for our easier case.
We shall then consider: What if my square is different than $$$[0,2^n-1] \times [0,2^n-1]$$$? What if the square we want to calculate on is $$$[x\times2^n,x\times2^n+2^n-1] \times [y\times2^n,y\times2^n+2 ^n-1]$$$?
This case turns out to be just as simple! We can see that as all bits in the binary representation except the last $$$n$$$ bit changes, $$$(x\times 2^n+i) \oplus (y\times 2^n+j) = 2^n(x \oplus y)+i \oplus j$$$. Based on this observation we can simply set $$$a_i = (i+x\times 2^n)^{-2}, b_i = (i+y\times 2^n)^{-3}$$$ and calculate $$$c_i = \sum_{x = 0}^{2^n-1}\sum_{y = 0}^{2^n-1} [x \oplus y = i]a_xb_y$$$. $$$\sum_{i = 0}^{2^n-1} c_i \times (i+(x \oplus y)2^n)^{33}$$$ will be the answer.
The algorithm complexity of this question is $$$O(n \log n)$$$, where $$$n = \max_{(x,y) \in E} \max(x,y)$$$. Consider using the FWT algorithm for calculation. We consider xor convolution which can only process one at a time a $$$[x\times2^n,x\times2^n+2^n-1] \times [y\times2^n,y\times2^n+2 ^n-1]$$$ square. We first process all the largest squares that are all inside the ellipse, and then process the next largest squares, and so on...
But the complexity of this algorithm is $$$O(n \log^2 n)$$$, which is not fast enough. Consider optimizing this algorithm. The method is to perform FWT from the bottom up, and calculate the squares that need to be calculated at each layer. After calculating the inner product of FWT array, we should not calculate the inverse FWT, but should "accumulate" it on the result array. (See author's solution for better understanding)
One issue in the complexity analysis of this question is to prove that the sum of the side lengths of all squares is $$$O(n \log n)$$$. This fact can be proved on the condition that the border function is a monotone function, and the boundary of the ellipse can be split into four monotone functions. The idea of the proof is to see that the y-intervals corresponding to each x-interval must be a constant plus some "extra" intervals, and for x-coordinate intervals of the same size, the total length of the "extra y-intervals" cannot exceed $$$n$$$. Since there is only $$$\log n$$$ sizes for x-intervals, the proof is done.