Question: In some string problems l have used hashing for checking a substring is present in a given string in o(1) [i.e for matching a substring] but i had a doubt that how can we say that hashing doesn't undergo collision while checking for matching substrings?
Note: the hashing logic is the same logic that used in rabin karp algorithm for patter matching. Please correct me if am wrong in any case.
Everyone are invited to solve my doubt
Thanks in Advance !!!
but i had a doubt that how can we say that hashing doesn't undergo collision while checking for matching substrings
Let's estimate your chances to get hash collision. Assume that the hash is a random number in range [0, m) with only one property: if two strings are equal, then hashes are equal.
Then let's check what number of distinct strings we can have without actually having equal hashes. According to the birthday paradox, we can roughly estimate this number as $$$c \cdot \sqrt{m}$$$, where $$$c = const$$$. So, if you have $$$c \cdot \sqrt{m}$$$ strings, you are likely to have collisions.
It depends on your problem whether you will be affected by hash collision, but the most simple way to get rid of it is to calculate two different hashes with different modulos and say that strings are equal if both hashes are equal. If this modulos are, for example, $$$10^9 + 7$$$ and $$$10^9 + 9$$$, you will have collisions only with $$$c \cdot 10^9$$$ strings, which is practically impossible in any given CP problem.
TL;DR hashes are based on probability, and they work almost certainly (have enough chance to pass in CP problem).
@Wind_Eagle is it c*10^9 or c* sqrt(10^9)
If you are using two modulos $$$m_1$$$ and $$$m_2$$$, you can say that you are using one modulo $$$m = m_1 \cdot m_2$$$, so $$$c \cdot \sqrt{m} = c \cdot \sqrt{(10^9 + 7) \cdot (10^9 + 9)} \approx c \cdot 10^9$$$.