Hardstone's hard problem

Revision en15, by dfsof, 2023-08-17 06:26:34

Prof. hardstone gives you an integer array $$$A$$$. The length of $$$A$$$ is $$$n$$$ and there are $$$m$$$ distinct numbers in $$$A$$$. Count the number of tuple $$$(l, r)$$$, $$$1 \leq l \leq r \leq n$$$, such that:

Numbers that appear in the interval $$$a[l...r]$$$ appear the same number of times.

For example, $$$A=[1,2,1,2]$$$, then there are $$$8$$$ legal tuples: $$$(1, 1), (2, 2), (3, 3), (4, 4), (1, 2), (2, 3), (3, 4), (1, 4)$$$.

This is an open problem with brain storm. $$$O(n^2m)$$$ brute force using the prefix sum and $$$O(n2^m)$$$ brute force using bitmasks and hashtable are easy to come up with. I am looking for a $$$O(nmlog^k)$$$ solution. Are there any smart data structures?

Note that when $$$A = [1,2,3]$$$, all intervals are legal. For example, $$$[1, 2]$$$ is legal, as both $$$1$$$ and $$$2$$$ appear once. We do not care about $$$3$$$ because $$$3$$$ does not appear.

amenotiomoi proposes a genius randomized idea, which could make my yesterday's idea work: Similar to the Zobrist hashing, we assign a random value to each distinct integer. We record the prefix sum of the hash values in a hashtable (let $$$h_r$$$ be the prefix sum of hash values of $$$a[1...r]$$$). Then, we fix $$$l$$$ and count the number of $$$r$$$ with respect to this $$$l$$$. For each $$$l$$$, we denote $$$p(l, j)$$$ be the first place $$$j$$$ appears after $$$l$$$ (inclusive), somehow like std::string.find(j, l). If $$$j$$$ never appears after $$$l$$$, we let $$$p(l, j) = \infty$$$. For example, if $$$A=[4,1,2,3]$$$, then $$$p(2, 1)=2, p(2,2)=3, p(2,3)=4, p(2,4) = \infty$$$. The array $$$p$$$ could be fould via binary search in $$$O(mlogn)$$$. Note that $$$p(l, j) \neq p(l, k)$$$ if $$$j \neq k$$$. Then, we sort the pair $$${p(l, j), j}$$$ in the ascending order of $$$p(l, j)$$$, and let $$$q$$$ be the sorted list. The complexity of sorting is $$$O(mlogm)$$$. For two adjacent elements of $$$q$$$, the present and absent numbers could be uniquely determined. For example, $$$A=[1,2,2,2,3]$$$, $$$r=1$$$, $$$2 \leq r \leq 4$$$, then $$$1, 2$$$ appear and $$$3$$$ is absent. Therefore we need to find the number of $$$r$$$, $$$2 \leq r \leq 4$$$, such that $$$1$$$ and $$$2$$$ appear the same number of times with in $$$a[l...r]$$$. Yesterday I was stuck here. But with the genius hashtable, we only need to count $$$r$$$ that $$$(hashvalue(1) + hashvalue(2)) \mid h_r - h_{l-1}$$$. By the pigeon hole principle, the number appear the least number of times appear at most $$$\frac{n}{i}$$$ times, then we only need to enumerate $$$\frac{n}{i}$$$ items for each adjacent pair of $$$q$$$, there are $$$m$$$ adjacent pairs, and querying the hash table is $$$O(logn)$$$ (using std::map) or amortized $$$O(1)$$$ (using std::unordered_map), therefore the overall complexity could be reduced to $$$O(n(mlogn + mlogm + \sum\limits_{i=1}^m\frac{n}{i}logn)) = O(n(mlogn + mlogm+nlogmlogn))$$$. But this is not deterministic, and the error probability is hard to estimate, heavily depending on implementation.

Tags data structures, string, array

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en18 English dfsof 2023-08-17 10:52:07 7 Tiny change: 'fter $l$, we let $p(l, j) ' -> 'fter $l$, $p(l, j) '
en17 English dfsof 2023-08-17 06:48:21 10
en16 English dfsof 2023-08-17 06:44:25 2 Tiny change: '2,2,3]$, $r=1$, $2 \l' -> '2,2,3]$, $l=1$, $2 \l'
en15 English dfsof 2023-08-17 06:26:34 5 Tiny change: '= O(n(mlogm + mlogn +nlogmlogn' -> '= O(n(mlogn + mlogm+nlogmlogn'
en14 English dfsof 2023-08-17 06:23:56 0 (published)
en13 English dfsof 2023-08-17 06:23:51 5 Tiny change: 'could be find via bina' -> 'could be fould via bina' (saved to drafts)
en12 English dfsof 2023-08-17 06:22:37 0 (published)
en11 English dfsof 2023-08-17 06:22:30 36 (saved to drafts)
en10 English dfsof 2023-08-17 06:19:31 15 Tiny change: 'r$ be the hash value of $a[1..' -> 'r$ be the prefix sum of hash values of $a[1..'
en9 English dfsof 2023-08-17 06:18:58 0 (published)
en8 English dfsof 2023-08-17 06:18:53 1 Tiny change: ' $O(mlogm). For two ' -> ' $O(mlogm)$. For two ' (saved to drafts)
en7 English dfsof 2023-08-17 06:17:52 0 (published)
en6 English dfsof 2023-08-17 06:17:45 691
en5 English dfsof 2023-08-17 06:10:29 739
en4 English dfsof 2023-08-17 06:02:41 86
en3 English dfsof 2023-08-17 06:02:04 542 (saved to drafts)
en2 English dfsof 2023-08-17 05:34:18 185
en1 English dfsof 2023-08-17 05:13:45 695 Initial revision (published)