whatthemomooofun1729's blog

By whatthemomooofun1729, history, 15 months ago, In English

Hi, I'm trying to find an algorithm to answer queries to find the most frequent element in a subsegment of an array. I've read this post on StackOverflow, which mentions a method to get $$$O(\sqrt{N})$$$ per query: https://stackoverflow.com/questions/40302407/how-to-find-the-most-frequent-number-and-its-frequency-in-an-array-in-range-l-r.

Basically, the author of the answer says that we can choose a value $$$B = C \cdot \sqrt{N}$$$, and then handle the queries on the subsegment $$$(L, R)$$$ with casework: Case 1. If $$$R - L + 1 < 2 \cdot B$$$, then just loop through every value between $$$L$$$ and $$$R$$$ and take maximum of all frequencies. Case 2. If $$$R - L + 1 \geq 2 \cdot B$$$, then loop through all elements of the array that are "heavy" (aka they appear more than $$$B$$$ times in the entire array) and take the maximum of their frequencies over the interval.

I can see why this approach will work, but I tried using it on this problem and it got WA. I follow basically the same idea in the editorial. I find the rightmost k and the leftmost k occurrences, and then find the mode on the subsegment of the array in between (using the algorithm described). Here is my code.

Not only does my code get WA, but if I change the value of $$$B$$$ I actually pass different amounts of test cases. With the value of $$$B = 3 \sqrt {N}$$$, I can pass the first 14. If I change this to $$$2 \sqrt{N}$$$, then I can only pass the first 7.

Is there some problem with the approach described in the StackOverflow post or is it because of some other error in my code? Your help is greatly appreciated!

»
15 months ago, # |
  Vote: I like it 0 Vote: I do not like it

The algorithm is right. Maybe the WA is because you've implemented something wrong in one case and right in the other, so if you change the length its verdict will change too.

»
15 months ago, # |
  Vote: I like it +8 Vote: I do not like it

Anyway, you can solve the problem faster. Read the editorial of 1514D - Cut and Stick for more efficient solutions.

»
15 months ago, # |
Rev. 3   Vote: I like it 0 Vote: I do not like it

Ok, I think I found out the issue.

The StackOverflow link (and the problem I've linked in the previous comment) can't find the most frequent element in a subsegment of an array in general: they can find it only if its frequency is at least half of the length of the interval.

In your problem, the optimal element in the middle subsegment may have a frequency which is much less than half the length of the interval (for example, if the subsegment contains $$$1$$$ twice and all the other values once).

I think the easiest way to answer these queries in general is Mo's algorithm. However, in this problem it's more efficient to just iterate over each value from $$$1$$$ to $$$200$$$ and check its frequency.

  • »
    »
    15 months ago, # ^ |
      Vote: I like it +9 Vote: I do not like it

    Actually, in China it's just this problem.

    • »
      »
      »
      15 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      How to solve it online?

      • »
        »
        »
        »
        15 months ago, # ^ |
          Vote: I like it +18 Vote: I do not like it

        We can solve it in $$$O(n \cdot \sqrt{q \cdot \log n})$$$, which should comfortably pass as $$$1 \leq n,q \leq 50000$$$.

        Let us coordinate compress the array to have $$$1 \leq a_i \leq n$$$.

        Now, for this array, we can find the most frequent for all prefixes in a single traversal from left to right, i.e., in $$$O(n)$$$.

        Consider some constant $$$b$$$, and we will decide the value of $$$b$$$ later. Now consider all the suffixes $$$a_i, a_{i+1}, \dots a_n$$$ such that $$$i$$$ is of the form $$$b \cdot k$$$. Note that we can have atmost $$$O(\frac{n}{b})$$$ such suffixes, and we can solve for all those suffixes in $$$O(n)$$$. Here by solving I mean finding the most frequent element of subarray $$$a_i, a_{i+1}, \ldots a_j$$$ for all $$$i \leq j \leq n$$$. Hence, we can solve for all valid suffixes in $$$O(\frac{n^2}{b})$$$. This is our pre-computation part.

        Now, let us see how to answer the queries online. If $$$r-l < b$$$, we can find the most frequent element in $$$O(b)$$$. Let us look at the case when $$$r - l > b$$$. Find the smallest index $$$h$$$ such that $$$h \geq l$$$ and $$$h$$$ is of the of the form $$$b \cdot k$$$. Now we know the most frequent element of subarray $$$a_h, a_{h+1}, \ldots a_r$$$(thanks to our pre-computation). So we have dealt with subarray $$$a_h, a_{h+1}, \ldots a_r$$$. Now it is possible that the most frequent element can be from the subarray $$$a_l, a_{h+1}, \ldots a_{h-1}$$$. But now $$$h-l<b$$$. So we can traverse over all the elements in the subarray $$$a_l, a_{h+1}, \ldots a_{h-1}$$$ and find their frequency in subarray $$$a_l, a_{l+1}, \ldots a_r$$$ in $$$O(logn)$$$ and update the most frequent element. As we had done coordinate compression, we can retrieve the original answer and print it. Hence, we can answer all queries in $$$O(b \cdot logn)$$$.

        So our time complexity is $$$O(\frac{n^2}{b}+q \cdot b \log n)$$$ On choosing, $$$b = \frac{n}{\sqrt{q \log n}}$$$, we can achieve the complexity $$$O(n \cdot \sqrt{q \cdot \log n})$$$.

        • »
          »
          »
          »
          »
          15 months ago, # ^ |
            Vote: I like it 0 Vote: I do not like it

          Oh, it's actually much cleaner than I expected. Thanks!

      • »
        »
        »
        »
        15 months ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Aha, I found my code!

        Code with 1<=n<=40000,1<=m<=50000