Finding the most frequent element in a subsegment

#	User	Rating
1	tourist	3985
2	jiangly	3814
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3517
7	Radewoosh	3410
8	hos.lyric	3399
9	ecnerwala	3392
9	Um_nik	3392

#	User	Contrib.
1	cry	169
2	maomao90	162
2	Um_nik	162
4	atcoder_official	161
5	djm03178	158
6	-is-this-fft-	157
7	adamant	155
8	awoo	154
8	Dominater069	154
10	luogu_official	150

Hi, I'm trying to find an algorithm to answer queries to find the most frequent element in a subsegment of an array. I've read this post on StackOverflow, which mentions a method to get $$$O(\sqrt{N})$$$ per query: https://stackoverflow.com/questions/40302407/how-to-find-the-most-frequent-number-and-its-frequency-in-an-array-in-range-l-r.

Basically, the author of the answer says that we can choose a value $$$B = C \cdot \sqrt{N}$$$, and then handle the queries on the subsegment $$$(L, R)$$$ with casework: Case 1. If $$$R - L + 1 < 2 \cdot B$$$, then just loop through every value between $$$L$$$ and $$$R$$$ and take maximum of all frequencies. Case 2. If $$$R - L + 1 \geq 2 \cdot B$$$, then loop through all elements of the array that are "heavy" (aka they appear more than $$$B$$$ times in the entire array) and take the maximum of their frequencies over the interval.

I can see why this approach will work, but I tried using it on this problem and it got WA. I follow basically the same idea in the editorial. I find the rightmost k and the leftmost k occurrences, and then find the mode on the subsegment of the array in between (using the algorithm described). Here is my code.

Not only does my code get WA, but if I change the value of $$$B$$$ I actually pass different amounts of test cases. With the value of $$$B = 3 \sqrt {N}$$$, I can pass the first 14. If I change this to $$$2 \sqrt{N}$$$, then I can only pass the first 7.

Is there some problem with the approach described in the StackOverflow post or is it because of some other error in my code? Your help is greatly appreciated!

// Author: XZC(L_Wave) // Language: Cpp/G++14 // Problem: P4168 [Violet]蒲公英 // Contest: Luogu // URL: https://www.luogu.com.cn/problem/P4168 // Memory Limit: 512 MB // Time Limit: 2000 ms // Create Time: 2023-01-13 19:31:13 // // Powered by CP Editor (https://cpeditor.org) //#pragma GCC optimize("Ofast", "inline") #include<bits/stdc++.h> #define Rep(i, n) for(int i=0; i< (int)(n); i++) #define Rpp(i, n) for(int i=1; i<=(int)(n); i++) #define Dpp(i, n) for(int i=(int)n; i; i--) #define Frr(i, s, e) for(int i=(int)(s); i<=(int)(e); i++) #define Tc int T; cin >> T; while(T--) #define Eps 1e-7 #define Pinf 0x3f3f3f3f3f3f3f3fLL #define Ninf (long long)0xcfcfcfcfcfcfcfcfLL #define Mem0(Cont) memset(Cont, 0, sizeof(Cont)) #define MemP(Cont) memset(Cont, 0x3f, sizeof(Cont)) #define MemN(Cont) memset(Cont, 0xcf, sizeof(Cont)) #define endl '\n' #define int long long #define YES cout << "YES\n" #define NO cout << "NO\n" #define Yes cout << "Yes\n" #define No cout << "No\n" #define yes cout << "yes\n" #define no cout << "no\n" //#define Files using namespace std; template <typename T> inline void Print(T x, char ed = '\n') { cout << x << ed; } template <typename T> inline void Exit(T x, int cd = 0) { cout << x << endl; exit(cd); } template <typename T> inline bool CheckMax(T& x, T y) { if(x < y) { x = y; return 1; } else return 0; } template <typename T> inline bool CheckMin(T& x, T y) { if(y < x) { x = y; return 1; } else return 0; } inline void Print_if(bool sth, string s1 = "Yes", string s2 = "No") { if(sth) cout << s1 << endl; else cout << s2 << endl; } constexpr int N = 300010, B = sqrt(N / log(N)); int n, m, b, a[N], rl[N], h[N]; vector <int> occc[N]; struct Block { int L, R; } bk[N / B]; int occ[N / B][N / B], num[N / B][N / B], cnt[N], res, last; template <typename IT, typename T> void Discrete(IT bg, IT ed, IT nw, T dt) { IT u = nw; for(IT k = bg; k != ed; k++, u++) *u = *k; sort(nw, u); u = unique(nw, u); for(IT k = bg; k != ed; k++) { *k = lower_bound(nw, u, *k) - nw + dt; } } int Occur(int x, int l, int r) { return upper_bound(occc[x].begin(), occc[x].end(), r) - lower_bound(occc[x].begin(), occc[x].end(), l); } signed main() { #ifdef Files freopen(".in", "r", stdin); freopen(".out", "w",stdout); #endif ios_base :: sync_with_stdio(0), cin.tie(0), cout.tie(0); cin >> n >> m; Rpp(i, n) cin >> a[i]; Discrete(a+1, a+n+1, rl+1, 1); Rpp(i, n) occc[a[i]].push_back(i); b = sqrt(log(2) / log(n) * n); Rpp(i, n) { h[i] = (i+b-1)/b; } Rpp(i, (n+b-1)/b) { bk[i].L = (i-1) * b + 1; bk[i].R = min(i * b, n); } Rpp(i, (n+b-1)/b) { res = 0; Frr(j, bk[i].L, n) { ++cnt[a[j]]; if(cnt[a[j]] > cnt[res]) { res = a[j]; } else if(cnt[a[j]] == cnt[res] && a[j] < res) res = a[j]; if(bk[h[j]].R == j) { occ[i][h[j]] = cnt[res]; num[i][h[j]] = res; } } Frr(j, bk[i].L, n) --cnt[a[j]]; } // cout << bk[2].L << ' ' << bk[2].R << endl; // cout << bk[4].L << ' ' << bk[4].R << endl; // cout << occ[2][4] << ' ' << rl[num[2][4]] << endl; // Rpp(i, n/b) { // Frr(j, i, n/b) { // cout << bk[i].L << ' ' << bk[j].R << ' ' << occ[i][j] << ' ' << num[i][j] << endl; // } // } // Rpp(i, n/b) Frr(j, i, n/b) { // int ll = bk[i].L, rr = bk[j].R; // Frr(k, ll, rr) if(Occur(k, ll, rr) > occ[i][j]) { cout << k << ' ' << i << ' ' << j << ' ' << ll << ' ' << rr << endl; } // } while(m--) { int l, r; cin >> l >> r; l = (l+last-1) % n + 1; r = (r+last-1) % n + 1; if(l > r) swap(l, r); int ghl = h[l]+1, lhr = h[r]-1, oc = 0, nm = 0; if(ghl > lhr) { res = 0; Frr(j, l, r) { ++cnt[a[j]]; if(cnt[a[j]] > cnt[res]) { res = a[j]; } else if(cnt[a[j]] == cnt[res] && a[j] < res) res = a[j]; oc = cnt[res]; nm = res; } Frr(j, l, r) --cnt[a[j]]; // assert(!accumulate(begin(cnt), end(cnt), 0ll)); } else { oc = occ[ghl][lhr], nm = num[ghl][lhr]; // cout << oc << ' ' << nm << endl; Frr(i, l, bk[h[l]].R) { int OC = Occur(a[i], l, r), NM = a[i]; if(OC > oc) { oc = OC; nm = NM; } else if(OC == oc && nm > NM) { nm = NM; } } // cout << oc << ' ' << nm << endl; Frr(i, bk[h[r]].L, r) { int OC = Occur(a[i], l, r), NM = a[i]; if(OC > oc) { oc = OC; nm = NM; } else if(OC == oc && nm > NM) { nm = NM; } } // cout << oc << ' ' << nm << endl; } if(0) cout << oc << endl; cout << rl[nm] << endl; last = rl[nm]; } return 0; }

Comments (8)

Write comment?

L_Wave

16 months ago, # |

The algorithm is right. Maybe the WA is because you've implemented something wrong in one case and right in the other, so if you change the length its verdict will change too.

→ Reply

TheScrasse

Anyway, you can solve the problem faster. Read the editorial of 1514D - Cut and Stick for more efficient solutions.

← Rev. 3 →

Ok, I think I found out the issue.

The StackOverflow link (and the problem I've linked in the previous comment) can't find the most frequent element in a subsegment of an array in general: they can find it only if its frequency is at least half of the length of the interval.

In your problem, the optimal element in the middle subsegment may have a frequency which is much less than half the length of the interval (for example, if the subsegment contains $$$1$$$ twice and all the other values once).

I think the easiest way to answer these queries in general is Mo's algorithm. However, in this problem it's more efficient to just iterate over each value from $$$1$$$ to $$$200$$$ and check its frequency.

16 months ago, # ^ |

Actually, in China it's just this problem.

How to solve it online?

satyam343

+18

We can solve it in $$$O(n \cdot \sqrt{q \cdot \log n})$$$, which should comfortably pass as $$$1 \leq n,q \leq 50000$$$.

Let us coordinate compress the array to have $$$1 \leq a_i \leq n$$$.

Now, for this array, we can find the most frequent for all prefixes in a single traversal from left to right, i.e., in $$$O(n)$$$.

Consider some constant $$$b$$$, and we will decide the value of $$$b$$$ later. Now consider all the suffixes $$$a_i, a_{i+1}, \dots a_n$$$ such that $$$i$$$ is of the form $$$b \cdot k$$$. Note that we can have atmost $$$O(\frac{n}{b})$$$ such suffixes, and we can solve for all those suffixes in $$$O(n)$$$. Here by solving I mean finding the most frequent element of subarray $$$a_i, a_{i+1}, \ldots a_j$$$ for all $$$i \leq j \leq n$$$. Hence, we can solve for all valid suffixes in $$$O(\frac{n^2}{b})$$$. This is our pre-computation part.

Now, let us see how to answer the queries online. If $$$r-l < b$$$, we can find the most frequent element in $$$O(b)$$$. Let us look at the case when $$$r - l > b$$$. Find the smallest index $$$h$$$ such that $$$h \geq l$$$ and $$$h$$$ is of the of the form $$$b \cdot k$$$. Now we know the most frequent element of subarray $$$a_h, a_{h+1}, \ldots a_r$$$(thanks to our pre-computation). So we have dealt with subarray $$$a_h, a_{h+1}, \ldots a_r$$$. Now it is possible that the most frequent element can be from the subarray $$$a_l, a_{h+1}, \ldots a_{h-1}$$$. But now $$$h-l<b$$$. So we can traverse over all the elements in the subarray $$$a_l, a_{h+1}, \ldots a_{h-1}$$$ and find their frequency in subarray $$$a_l, a_{l+1}, \ldots a_r$$$ in $$$O(logn)$$$ and update the most frequent element. As we had done coordinate compression, we can retrieve the original answer and print it. Hence, we can answer all queries in $$$O(b \cdot logn)$$$.

So our time complexity is $$$O(\frac{n^2}{b}+q \cdot b \log n)$$$ On choosing, $$$b = \frac{n}{\sqrt{q \log n}}$$$, we can achieve the complexity $$$O(n \cdot \sqrt{q \cdot \log n})$$$.

Oh, it's actually much cleaner than I expected. Thanks!

Aha, I found my code!

Code with 1<=n<=40000,1<=m<=50000

whatthemomooofun1729's blog