dfsof's blog

By dfsof, history, 16 months ago, In English

hardstone.Orz gives you an integer array $$$A$$$. The length of $$$A$$$ is $$$n$$$ and there are $$$m$$$ distinct numbers in $$$A$$$. Count the number of tuple $$$(l, r)$$$, $$$1 \leq l \leq r \leq n$$$, such that:

Numbers that appear in the interval $$$a[l...r]$$$ appear the same number of times.

For example, $$$A=[1,2,1,2]$$$, then there are $$$8$$$ legal tuples: $$$(1, 1), (2, 2), (3, 3), (4, 4), (1, 2), (2, 3), (3, 4), (1, 4)$$$.

This is an open problem with brain storm. $$$O(n^2m)$$$ brute force using the prefix sum and $$$O(n2^m)$$$ brute force using bitmasks and hashtable are easy to come up with. I am looking for a $$$O(nmlog^k)$$$ solution. Are there any smart data structures?

Note that when $$$A = [1,2,3]$$$, all intervals are legal. For example, $$$[1, 2]$$$ is legal, as both $$$1$$$ and $$$2$$$ appear once. We do not care about $$$3$$$ because $$$3$$$ does not appear.

amenotiomoi proposes a genius randomized idea, which could make my yesterday's idea work: Similar to the Zobrist hashing, we assign a random value to each distinct integer. We record the prefix sum of the hash values in a hashtable (let $$$h_r$$$ be the prefix sum of hash values of $$$a[1...r]$$$). Then, we fix $$$l$$$ and count the number of $$$r$$$ with respect to this $$$l$$$. For each $$$l$$$, we denote $$$p(l, j)$$$ be the first place $$$j$$$ appears after $$$l$$$ (inclusive), somehow like std::string.find(j, l). If $$$j$$$ never appears after $$$l$$$, $$$p(l, j) = \infty$$$. For example, if $$$A=[4,1,2,3]$$$, then $$$p(2, 1)=2, p(2,2)=3, p(2,3)=4, p(2,4) = \infty$$$. The array $$$p$$$ could be fould via binary search in $$$O(mlogn)$$$. Note that $$$p(l, j) \neq p(l, k)$$$ if $$$j \neq k$$$. Then, we sort the pair $$${p(l, j), j}$$$ in the ascending order of $$$p(l, j)$$$, and let $$$q$$$ be the sorted list. The complexity of sorting is $$$O(mlogm)$$$. For two adjacent elements of $$$q$$$, the present and absent numbers could be uniquely determined. For example, $$$A=[1,2,2,2,3]$$$, $$$l=1$$$, $$$2 \leq r \leq 4$$$, then $$$1, 2$$$ appear and $$$3$$$ is absent. Therefore we need to find the number of $$$r$$$, $$$2 \leq r \leq 4$$$, such that $$$1$$$ and $$$2$$$ appear the same number of times with in $$$a[l...r]$$$. Yesterday I was stuck here. But with the genius hashtable, we only need to count $$$r$$$ that $$$(hashvalue(1) + hashvalue(2)) \mid h_r - h_{l-1}$$$. By the pigeon hole principle, the number appear the least number of times appear at most $$$\frac{n}{i}$$$ times, then we only need to enumerate $$$\frac{n}{i}$$$ items for each adjacent pair of $$$q$$$, there are $$$m$$$ adjacent pairs, and querying the hash table is $$$O(logn)$$$ (using std::map) or amortized $$$O(1)$$$ (using std::unordered_map), therefore the overall complexity could be reduced to $$$O(n(mlogn + mlogm + \sum\limits_{i=1}^m\frac{n}{i}logn)) = O(n(mlogn + mlogm+nlogmlogn))$$$. But this is not deterministic, and the error probability is hard to estimate, heavily depending on implementation.

  • Vote: I like it
  • +38
  • Vote: I do not like it

»
16 months ago, # |
Rev. 2   Vote: I like it +13 Vote: I do not like it

I think $$$O(n m)$$$ is also possible.

Let's assign a random $$$64$$$-bit number $$$\rho[i]$$$ for all distinct integers $$$i = 1, m$$$. We'll define the hash function of a range as $$$H[l..r) = \rho[a[l]] + ... + \rho[a[r - 1]]$$$ (simple sum modulo $$$2^{64}$$$)

Of course, in reality $$$H[l..r) = H[1..r) - H[1..l)$$$, so we can use prefix sums for this, as you mentioned. Let's then denote $$$\sigma[i] = \rho[a[1]] + ... + \rho[a[i - 1]]$$$. Consequently, $$$H[l..r) = \sigma[r] - \sigma[l]$$$.

Then, a sequence $$$[l..r)$$$ is good iff

$$$H[l..r) = \frac{r - l}{numUnique[l..r)} \cdot sumUnique[l..r)$$$

where $sumUnique[l..r)$ is the sum of hashes of unique elements inside $$$[l..r)$$$ and $$$numUnique[l..r)$$$ is the number of such elements. The above eqn is equivalent to:

$$$H[l..r) \cdot numUnique[l..r) = (r - l) \cdot sumUnique[l..r)$$$
$$$(\sigma[r] - \sigma[l]) \cdot numUnique[l..r) = (r - l) \cdot sumUnique[l..r)$$$
$$$\sigma[r] \cdot numUnique[l..r) - r \cdot sumUnique[l..r) = \sigma[l] \cdot numUnique[l..r) - l \cdot sumUnique[l..r)$$$

Ok, that was a little algebraic manipulation. Now, let's sweepline over $r$ and keep some DS over possible $$$l$$$'s. Note that for any $$$r$$$, there are evidently at most $$$m$$$ distinct values of $$$sumUnique[l..r)$$$ and $$$numUnique[l..r)$$$. Then, let's keep $$$m$$$ hashmaps for each new value of $$$numUnique[l..r)$$$ keeping a simple frequency map of values of kind $$$\sigma[l] \cdot numUnique[l..r) - l \cdot sumUnique[l..r)$$$. Then, computing the contribution of $$$r$$$ is just summing up $$$m$$$ hashmap queries.

When we transition $$$r \rightarrow r + 1$$$, some $$$l$$$'s will get "promoted" from the $$$i$$$-th hashtable to the $$$i+1$$$-th. But that's okay, because at most $$$m$$$ such "promotions" will occur for each $$$l$$$, so the total complexity if recomputing everything that changes from scratch amortizes to $$$O(nm)$$$.

»
16 months ago, # |
  Vote: I like it -10 Vote: I do not like it

I've found a O(n^2 * K) solution, where failing probability is 1/2^K (or somewhat like that, we can choose K=50 to guarantee). Like this problem https://mirror.codeforces.com/problemset/problem/1746/F we can build K masks of numbers and calculate prefix sums. Then for each (l, r), we can find the number of different numbers and then check for each mask that (pref[r]-pref[l-1])%x = 0, where x is the number of different numbers.

  • »
    »
    16 months ago, # ^ |
    Rev. 2   Vote: I like it -10 Vote: I do not like it

    Also found a more easier O(n^2) solution. Iterating l=1 to l=n, we will iterate r, l to n. We will keep track of frequency table of frequency values of numbers, we only need to check last changed number to ensure that our (l, r) satisfies the condition.

»
16 months ago, # |
  Vote: I like it 0 Vote: I do not like it

https://ejoi2017.eolymp.io/problems/1

I might be wrong, but isn't this the same problem (for $$$M = 52$$$)? I had an $$$O(NMlog)$$$ solution which passed.

»
16 months ago, # |
Rev. 3   Vote: I like it 0 Vote: I do not like it

Btw deterministic nm^2 is in editorial of USACO Open 14 Fair Photography.