Explanation needed for Boyer-Moore Majority Element, and one of its extensions.

→ Pay attention

Before contest
Educational Codeforces Round 173 (Rated for Div. 2)
45:13:48
Register now »

→ Streams

Codeforces Round 995 Solution Discussion

By aryanc403

Stream is running

View all →

→ Top rated

#	User	Rating
1	tourist	3985
2	jiangly	3814
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3517
7	Radewoosh	3410
8	hos.lyric	3399
9	ecnerwala	3392
9	Um_nik	3392

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	cry	169
2	maomao90	162
2	Um_nik	162
4	atcoder_official	161
5	djm03178	158
6	-is-this-fft-	157
7	adamant	155
8	awoo	154
8	Dominater069	154
10	luogu_official	150

View all →

→ Find user

→ Recent actions

Detailed →

prac64's blog

Explanation needed for Boyer-Moore Majority Element, and one of its extensions.

By prac64, history, 7 years ago, In English

Hey CodeForces, recently I came accross a problem, finding an element in an array which occurs more than n/3 times, while using constant space and linear time. Now I admit, this is a well known interview problem and plenty of articles exist which cover this, however I have struggled to find one that offers proof of correctness or any explanation.

Please help me understand the correctness.

Article: https://www.geeksforgeeks.org/given-an-array-of-of-size-n-finds-all-the-elements-that-appear-more-than-nk-times/

Basic Idea: Like we do in Boyer-Moore algorithm, however instead of keeping one element and counter, keep 2 , then update them similar to the original algorithm.

Thank you CodeForces!

prac64
7 years ago
14

Comments (12)

Show archived | Write comment?

Rezwan.Arefin01

7 years ago, # |

← Rev. 3 →

+24

You can have a randomized algorithm. That is -
Take a random number from the array, and check if it appears more than $\text{[math]}$ times. If you repeat this x times, then you have $\text{[math]}$ probability of hitting a number with frequency more than $\text{[math]}$ .
If n = 10⁵ and k = 3, then if you check only 40 random numbers, you have 99% probability of hitting such a number.
It may have complexity O(nx) or O(n + x) depending on the bounds of the numbers of array, and linear space. However, this may not be fesible always if x needs to be high.

This idea can be used to solve this problem — 840D - Destiny

→ Reply

LanceTheDragonTrainer

7 years ago, # ^ |

Do you have the prove of the probability?

→ Reply

Rezwan.Arefin01

7 years ago, # ^ |

Hint: Try finding the probability that you don't hit such number in all (x - 1) guesses.

→ Reply

prac64

7 years ago, # ^ |

Is it not important that you hit 40 distinct random numbers, in which case you will have to use auxillary memory to keep track of what you've already checked. Also can you prove how many calls to the rand() function you require to get 40 distinct values ?

Anyway, thank you for giving me a new problem solving method ! :)

→ Reply

Rezwan.Arefin01

7 years ago, # ^ |

You don't need to keep track of which numbers you have visited. The point is, if you keep taking a random number and check, you'll find one in around 40 checks. So, just keep trying untill you find one. If you are going to use C/C++, the rand() % n shuold be enough. You may safely assume that it doesn't generate same number over and over again in close calls.
But in case, if the data is generated based on the numbers generated by rand(), then you can just reseed it by some random prime. Or use rand() * 1ll * rand() % n or something like that.

→ Reply

prac64

7 years ago, # ^ |

Makes much more sense now, thank you !

→ Reply

fmqjpt

7 years ago, # ^ |

You can also have an array A, initialized to contain [1....N] and random_shuffle() it and then choose the first however many elements you need as indices.

→ Reply

prac64

7 years ago, # ^ |

Forgive me if I am wrong , but is it not more complex, I mean that will definitely take O(n) operations, but Rezwan's method is expected to take much fewer.

Anyway I was primarily looking for proof of correctness of the said algorithm.Since I primarily like understanding why algorithms are correct, even though I am not very good at it.

Solving problems is a different game, for ex, in this problem one could probably use quickselect on multiples of k and still get correct answer in expected O(kn)time, but that is not the point.

→ Reply

farmersrice

7 years ago, # ^ |

It's the same complexity

→ Reply

prac64

7 years ago, # ^ |

oh yeah.. my bad !

→ Reply

BlackTools

7 years ago, # |

← Rev. 2 →

Problem: Find all elements of A, |A| = n, that appear more than $\text{[math]}$ times.

Let's split A in two smaller subproblems A₁, |A₁| = n₁, and A₂, |A₂| = n₂, such that n = n₁ + n₂. We can define K_i = (a₁, a₂, ..., a_k) as the k greatest elements by frequency in subproblem A_i and F(x, A_i) as the frequency of element x in A_i.

Then we argue that the elements of K₁ and K₂ are the only candidates to appear more than $\text{[math]}$ times in the original problem (K).

Let's do a proof by contradiction and assume that there is an element x such that $\text{[math]}$ , $\text{[math]}$ and $\text{[math]}$ . We know that $\text{[math]}$ . Using the converse we know that $\text{[math]}$ and $\text{[math]}$ then:

F(x, A) = F(x, A₁) + F(x, A₂)

$\text{[math]}$

$\text{[math]}$ Contradiction!!

So, the elements of K₁ and K₂ are the only candidates to appear more than $\text{[math]}$ times in the original problem (K).

For a more formal proof, use strong induction on |A₁| + |A₂| within the induction on |A|. Regards.

→ Reply

prac64

7 years ago, # |

Nevermind thought out a rough proof:

Consider k-1 stacks(for now, we will reduce to counters later) and the remaining array that is to be seen, now instead of incrementing the counter, imagine you push the same value of the element on the respective stack. similarly for decrementing, imagine you remove one element from every stack.

Now define solution set as elements in stack+ remaining array. we can remove elements from solution set only when it contains k distinct elements.. that is k-1 distinct elements in the stacks and a new element from the remaining array. so now we discard k distinct elements from the solution set.

After we are finished with the entire array, we would have discarded only tuples of k distinct elements. It is now very clear that elements with frequency greater(strictly) than n/k remain in the stack. If the array did not contain any greater-than-n/k-elements, the stack will have garbage values. Hence we need to check once more in (k-1)n time.

Instead of having stacks with same elements he compressed it into value+num_instances pairs.

→ Reply