Prof. hardstone gives you an integer array $$$A$$$. The length of $$$A$$$ is $$$n$$$ and there are $$$m$$$ distinct numbers in $$$A$$$. Count the number of tuple $$$(l, r)$$$, $$$1 \leq l \leq r \leq n$$$, such that:
Numbers that appear in the interval $$$a[l...r]$$$ appear the same number of times.
For example, $$$A=[1,2,1,2]$$$, then there are $$$8$$$ legal tuples: $$$(1, 1), (2, 2), (3, 3), (4, 4), (1, 2), (2, 3), (3, 4), (1, 4)$$$.
This is an open problem with brain storm. $$$O(n^2m)$$$ brute force using the prefix sum and $$$O(n2^m)$$$ brute force using bitmasks and hashtable are easy to come up with. I am looking for a $$$O(nmlog^k)$$$ solution. Are there any smart data structures?
Note that when $$$A = [1,2,3]$$$, all intervals are legal. For example, $$$[1, 2]$$$ is legal, as both $$$1$$$ and $$$2$$$ appear once. We do not care about $$$3$$$ because $$$3$$$ does not appear.
amenotiomoi proposes a genius randomized idea, which could make my yesterday's idea work: Similar to the Zobrist hashing, we assign a random value to each distinct integer. We record the prefix sum of the hash values in a hashtable (let $$$h_r$$$ be the prefix sum of hash values of $$$a[1...r]$$$). Then, we fix $$$l$$$ and count the number of $$$r$$$ with respect to this $$$l$$$. For each $$$l$$$, we denote $$$p(l, j)$$$ be the first place $$$j$$$ appears after $$$l$$$ (inclusive), somehow like std::string.find(j, l)
. If $$$j$$$ never appears after $$$l$$$, we let $$$p(l, j) = \infty$$$. For example, if $$$A=[4,1,2,3]$$$, then $$$p(2, 1)=2, p(2,2)=3, p(2,3)=4, p(2,4) = \infty$$$. The array $$$p$$$ could be fould via binary search in $$$O(mlogn)$$$. Note that $$$p(l, j) \neq p(l, k)$$$ if $$$j \neq k$$$. Then, we sort the pair $$${p(l, j), j}$$$ in the ascending order of $$$p(l, j)$$$, and let $$$q$$$ be the sorted list. The complexity of sorting is $$$O(mlogm)$$$. For two adjacent elements of $$$q$$$, the present and absent numbers could be uniquely determined. For example, $$$A=[1,2,2,2,3]$$$, $$$r=1$$$, $$$2 \leq r \leq 4$$$, then $$$1, 2$$$ appear and $$$3$$$ is absent. Therefore we need to find the number of $$$r$$$, $$$2 \leq r \leq 4$$$, such that $$$1$$$ and $$$2$$$ appear the same number of times with in $$$a[l...r]$$$. Yesterday I was stuck here. But with the genius hashtable, we only need to count $$$r$$$ that $$$(hashvalue(1) + hashvalue(2)) \mid h_r - h_{l-1}$$$. By the pigeon hole principle, the number appear the least number of times appear at most $$$\frac{n}{i}$$$ times, then we only need to enumerate $$$\frac{n}{i}$$$ items for each adjacent pair of $$$q$$$, there are $$$m$$$ adjacent pairs, and querying the hash table is $$$O(logn)$$$ (using std::map
) or amortized $$$O(1)$$$ (using std::unordered_map
), therefore the overall complexity could be reduced to $$$O(n(mlogn + mlogm + \sum\limits_{i=1}^m\frac{n}{i}logn)) = O(n(mlogn + mlogm+nlogmlogn))$$$. But this is not deterministic, and the error probability is hard to estimate, heavily depending on implementation.