nandapavan26's blog

By nandapavan26, history, 15 months ago, In English

Question: In some string problems l have used hashing for checking a substring is present in a given string in o(1) [i.e for matching a substring] but i had a doubt that how can we say that hashing doesn't undergo collision while checking for matching substrings?

Note: the hashing logic is the same logic that used in rabin karp algorithm for patter matching. Please correct me if am wrong in any case.

Everyone are invited to solve my doubt

Thanks in Advance !!!

  • Vote: I like it
  • +5
  • Vote: I do not like it

| Write comment?
»
15 months ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

but i had a doubt that how can we say that hashing doesn't undergo collision while checking for matching substrings

Let's estimate your chances to get hash collision. Assume that the hash is a random number in range [0, m) with only one property: if two strings are equal, then hashes are equal.

Then let's check what number of distinct strings we can have without actually having equal hashes. According to the birthday paradox, we can roughly estimate this number as $$$c \cdot \sqrt{m}$$$, where $$$c = const$$$. So, if you have $$$c \cdot \sqrt{m}$$$ strings, you are likely to have collisions.

It depends on your problem whether you will be affected by hash collision, but the most simple way to get rid of it is to calculate two different hashes with different modulos and say that strings are equal if both hashes are equal. If this modulos are, for example, $$$10^9 + 7$$$ and $$$10^9 + 9$$$, you will have collisions only with $$$c \cdot 10^9$$$ strings, which is practically impossible in any given CP problem.

TL;DR hashes are based on probability, and they work almost certainly (have enough chance to pass in CP problem).

  • »
    »
    15 months ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    @Wind_Eagle is it c*10^9 or c* sqrt(10^9)

    • »
      »
      »
      15 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      If you are using two modulos $$$m_1$$$ and $$$m_2$$$, you can say that you are using one modulo $$$m = m_1 \cdot m_2$$$, so $$$c \cdot \sqrt{m} = c \cdot \sqrt{(10^9 + 7) \cdot (10^9 + 9)} \approx c \cdot 10^9$$$.