Background
Square root decomposition / Mo's algorithm is a known method used to solve a large variety of problems in O(N sqrt(N)). However, it is not used all that often by more experience programmers because we can use a BIT or segment tree instead and attain logarithmic time. However, I've recently stumbled upon problem which I only know how to solve using sqrt decomposition.
The problem
You are given an array A with N elements (indexing starts from 1). You need to answer M queries of the type: "Given three indices l r k, find the number of elements which appear exactly k times in the subarray A[l], A[l+1], ..., A[r].
Input:
11 3
1 2 4 3 2 5 6 4 5 2 1
1 6 2
2 7 3
4 11 1
Output:
1
0
4
I've tried implementing online/offline solutions using persistent segments and binary indexed trees, but to no avail. Is there a O(N logN) or O(N log^2 N) solution for this specific problem?
Can you explain the sqrt decomposition approach? I can't think of one without having a factor of k in the time complexity.
You divide the queries into blocks of sqrt(n) and sort them. The comparision function would be
You keep two variables left and right which represent the indices of the current query. You go through all the queries and you increment/decrement the left and right variables until you reach the indices of the current query.
Btw, u want the solution to be (n logn) for each query (so m(n logn) or for all together?
O(m*n log n) would be worse than brute force. I meant O(log n) per query
Yeah sorry for the stupid question lol, I didn't understand your solution well, I'm not really versed in this sqrt thing, will have to think about it a bit more.
You can do it in O((N + Q) * log(N)) using persistent segment trees (also some other segment trees also work). For every a[i] find the position of (k + 1)-th occurrence of a[i] in [i, n] range, let's denote it x. If such occurrence doesn't exist, set x to be infinity or such. Now, build a segment tree over x values. The answer to the query [l, r] is simply the number of x values that satisfy x <= r. You can do this part via persistent segment trees in O(log(n)) (or merge sort tree in O(log^2(n)) or whatever else fits in TL). This is a well-known problem to be solved via persistent segment trees. Just query tree(r) and tree(l — 1), and etc.
k is not fixed
Let's make the array have size $$$2n$$$ for convenience. (This won't affect the asymptotic time complexity). We can adapt the reduction from here to show the problem is at least as hard as multiplying two $$$\sqrt{n}$$$ by $$$\sqrt{n}$$$ boolean matrices. (This isn't a proof of impossibility because it is theoretically possible in $$$o(n^{1.5})$$$, but you'll probably never use it in practice).
Multiplication of two $$$\sqrt{n}$$$ by $$$\sqrt{n}$$$ boolean matrices can be restated as follows:
Let $$$S=\{1,2,\ldots,\sqrt{n}\}$$$. Given $$$\sqrt{n}$$$ sets $$$A_1,A_2,\ldots, A_\sqrt{n} \subseteq S$$$ and $$$\sqrt{n}$$$ sets $$$B_1,B_2,\ldots, B_\sqrt{n} \subseteq S$$$, compute $$$|A_i\cap B_j|$$$ for each $$$1\leq i,j\leq \sqrt{n}$$$.
We can reduce this to your problem by dividing the array into $$$2\sqrt{n}$$$ blocks of size $$$\sqrt{n}$$$. Every block will contain each element of $$$\{1,2,\ldots,\sqrt{n}\}$$$ exactly once. For $$$1\leq i \leq \sqrt{n}$$$, the $$$i$$$th block from the beginning will contain the elements of $$$S-A_i$$$ followed by elements of $$$A_i$$$. For $$$1\leq j \leq \sqrt{n}$$$, the $$$j$$$th block from the end will contain the elements of $$$B_j$$$ followed by elements of $$$S-B_j$$$. For each $$$1\leq i,j\leq \sqrt{n}$$$, we can choose an interval containing $$$A_i$$$, $$$B_j$$$, and some full blocks. Then, we query how many numbers occur two more times than the number of full blocks. This will be exactly $$$|A_i\cap B_j|$$$, as desired.