Introduction
Hi everyone, because we noticed there was no good tutorial in English explaining this idea, KrowSavcik and I have decided to write a tutorial blog on it!
This idea finds the values of each element from a binary array using $$$O(N / log(N))$$$ sum queries, more specifically it solves problems of the type:
Given a binary array $$$b$$$ (an array consisting only of zeroes and ones) of $$$n$$$ elements, you can ask queries of the following format:
What is the sum of the subsequence of the values on positions: $$$p_1, p_2, ..., p_k$$$. In other words you will give the subsequence $$$p_1, p_2, ..., p_k$$$ to the interaction and it will return you the value of ($$$b_{p_1} + b_{p_2} + ... + b_{p_k}$$$).
And after asking $$$O(N / log(N))$$$ queries we should be able to tell the value of each index.
The technique was mentioned here too but it was not as detailed.
Concept
The idea is to use a divide and conquer-like approach. Because it's not an intuitive concept, firstly I will have to add some simple notations, so it will be easier to understand it.
- $$$x_i$$$ — refers to the position $$$i$$$
- $$$A$$$ — any capital letter (except S) refers to a set of points $$$x_i$$$
- $$$v_{A} = \sum b_{x_i}$$$ for $$$x_i \in A$$$
- $$$(A)$$$ — a set in brackets is a query
- $$$k$$$ — the layer we are currently considering
- $$$S_i$$$ — Set of queries
Also you should keep in mind that indices are enumerated from 0.
Also keep in mind that even though it's said the queries are used, we actually only query at the end, and we build up our set of queries first for multiple layers first.
As it is a divide and conquer-like approach we will work with layers. Let's say that for layer $$$k$$$ we need $$$2^k$$$ queries and that from it we can get the value of $$$f_k$$$ elements. Then $$$f_{k+1} = 2 \cdot f_k + 2^k - 1$$$.
First of all, let's set our base case, which is $$$k = 0$$$. So, for $$$k = 0, f_0 = 1$$$ and the query set will only be {$$$0$$$}, so we know a single element using a single query.
The block $$$f_{k+1}$$$ is formed from $$$2$$$ blocks of size $$$f_k$$$ and $$$2^k -1$$$ additional elements in this order. Let's say $$$k_1$$$ represents the first block of $$$f_k$$$ elements and the $$$k_2$$$ the second such block.
The first query is used to get the sum on $$$[f_k, 2 \cdot f_k)$$$ — the sum of the second block. Then we add two new queries for each non last query in $$$S_{k_1}$$$ and $$$S_{k_2}$$$. First one is $$$S_{k_1}[i] \cup S_{k_2}[i]$$$. Second one is $$$S_{k_1}[i] \cup ([f_k, 2 \cdot f_k) / S_{k_2}[i]) \cup x_{(2 \cdot f_k + i)}$$$.
The last query is for entire range $$$[0, f_{k+1})$$$.
I think it's easy to see why we now have $$$2^{k+1}$$$ queries. The harder part is to understand why we don't lose any value in this process, and how we can solve it recursively. In fact, having answered all the $$$S_{k+1}$$$ queries, we can calculate all the $$$v_{S_{k_1}[i]}$$$ and $$$v_{S_{k_2}[i]}$$$.
Now, when we reach a $$$k$$$ with a value of $$$f_k >= n$$$ we can stop there. Let's assume $$$n = f_k$$$ since it will be easier to work with it (when $$$n$$$ is smaller than $$$f_k$$$ we can just think of it as appending $$$f_k - n$$$ zeroes at the end since they won't influence the sum at all). There is a more elegant way to form $$$n$$$ from blocks of different sizes, but it doesn't enter in the scope of this blog and it's not that hard to implement using offsets. Using the set of queries responsible for the $$$k$$$-th layer we can in fact now reconstruct the whole sequence, now recursively going from the $$$k$$$-th layer to the ($$$k - 1$$$)-th one (but consider each layer can have multiple blocks).
First of all, the only things we care about when we are in the $$$k$$$-th layer are:
- The query set for the corresponding block of the corresponding layer.
- The block we are currently at (can be dealt with using an offset value in the recursion).
So we will store them when going recursively.
Firstly, let's again set our base case: $$$k = 0$$$. We are now sure that only one element is responsible for this block from this layer, so we can just set the value of the $$$b_x$$$-th bit (where $$$x$$$ is some offset value we use to keep track of the block) to $$$v_{S_0}$$$ (since that's the sum for a single element which we've seen at the build-up).
Now, since the base case is already dealt with, here's how we will go to the ($$$k - 1$$$)-th layer:
We will need to reconstruct the previous query sets for the first block and the second block of size $$$f_{k - 1}$$$, and set the values for the other $$$2^{(k - 1)} - 1$$$ values respectively (since they aren't part of any of the blocks they shouldn't be part of the recursion either).
Let's denote the numbers of $$$1$$$-s in $$$[f_k, 2 \cdot f_k)$$$ with $$$c$$$. It's obvious that $$$c = S[0]$$$. We know will go through every pair of queries, starting from 1. That means we will be analyzing queries $$$S_1[i] \cup S_2[i]$$$ and $$$S_1[i] \cup ([f_{k-1}, 2 \cdot f_{k-1}) / S_2[i]) \cup x_{(2 \cdot f_{k-1} + i)}$$$.
- $$$v_{S[2 \cdot i + 1]} = v_{S_1[i]} + v_{S_2[i]}$$$
- $$$v_{S[2 \cdot i + 2]} = v_{S_1[i]} + c - v_{S_2[i]} + b_{2 \cdot f_{k-1} + i}$$$
In this case we have to calculate 3 values: $$$v_{S_1[i]} , v_{S_2[i]} , b_{2 \cdot f_{k-1} + i}$$$.
- $$$v_{S_1[i]} = \lfloor\frac{v_{S[2 \cdot i + 1]} + v_{S[2 \cdot i + 2]} - c}{2}\rfloor$$$
- $$$v_{S_2[i]} = \lfloor\frac{v_{S[2 \cdot i + 1]} - v_{S[2 \cdot i + 2]} + c}{2}\rfloor$$$
- $$$b_{2 \cdot f_{k-1} + i} = (v_{S[2 \cdot i + 1]} + v_{S[2 \cdot i + 2]} - c) \land 1$$$
The only remaining queries to answer are $$$v_{S_1[2^{k-1}]}$$$ and $$$v_{S_2[2^{k-1}]}$$$ as we didn't add them in $$$S$$$.
- $$$v_{S_2[2^{k-1}]} = c = v_{S[0]}$$$
- $$$v_{S_1[2^{k-1}]} = v{S[2^k]} - c - \sum_{i = 0}^{2^{k-1}-1} b_{2 \cdot f_{k-1} + i} $$$