Given a string S of length n characters, is it possible to calculate the Hash of its substring [i, j] (From index i to index j. Inclusive) in O(1) using some form of precomputation ? Maybe a modification of the Rolling Hash ?
Similar Problem
Problem Statement
I have seen it being used in a similar problem where in a string was given in a compressed form. Meaning, e.g. if the string is "aaabccdeeee"
then the compressed form is:
3 a 1 b 2 c 1 d 4 e
How data was stored
They are stored in an str[]
array as :
str[] = [{'a','3'}, {'b','1'}, {'c','2'}....]
HASHING Concept that was used in the solutions
And programmers had used the following hash concept to find if the given substring is a Palindrome or not. Given a substring of string S as (i,j), they computed the hash of substring [i , (i+j)/2] and the reverse hash of substring [(i+j+2)/2, j] and checked if they were equal or not. So if they wanted to check if in string S = "daabac"
whether substring [1, 5] is a a palindrome or not, they computed the following :
h1 = forward_hash("aa") h2 = reverse_hash("ba") h1 == h2
Code for the Hashing Concept
The hash precomputation was done as follows :
/* Computing the Prefix Hash Table */ pre_hash[0] = 0; for(int i=1;i<=len(str);i++) { pre_hash[i] = pre_hash[i-1]*very_large_prime + str[i].first; pre_hash[i] = pre_hash[i]*very_large_prime + str[i].second; } /* Computing the Suffix Hash Table */ suff_hash[0] = 0; for(int i=1;i<=len(str);i++) { suff_hash[i] = suff_hash[i-1]*very_large_prime + str[len(str)-i+1].first; suff_hash[i] = suff_hash[i]*very_large_prime + str[len(str)-i+1].second; }
And then the hash was computed using the following functions :
/* Calculates the Forward hash of substring [i,j] */ unsigned long long CalculateHash(int i, int j) { if(i>j) return -1; unsigned long long ret = pre_hash[j] - POW(very_large_prime, [2*(j-i+1)])*pre_hash[i-1]; return ret; } /* Calculates the reverse hash of substring [i,j] */ unsigned long long CalculateHash_Reverse(int i, int j) { unsigned long long ret = suff_hash[j] - POW(very_large_prime,[2*(j-i+1)])*suff_hash[i-1]; return ret; }
What I am trying to do
I am looking for a general approach to the above concept. Given a Pattern P, I want to check if the pattern P is present in a string S. I know the index (i)
to check where it may be present. And I also know the length of pattern P represented as |P|
. In short I want to check if hash of S[i, i+|P|]
and hash of P
match or not in O(1)
using some form of pre computation on S
. Is it possible ?
Ignoring the time taken to compute hash of P else it would be O(1+|P|)
If s[i]...s[j]=s[p]...s[q] then (h[j]-h[i-1])*(prime^p)==(h[q]-h[p-1])*(prime^i) where h is array of prefix hashes. BTW its better to calc h array using this formula h[i]=(h[i-1]*small_prime+s[i])%very_big_prime
The pattern is not part of the string. It is a separate string. Also the format of the string is not in compressed form as is given in the similar question. The string I am using is a general uncompressed string.
This reminds me, the hash being computed in the above approach (my post) is computed on the compressed string. So how is it checking for a substring e.g.
How does it check for the substring palindrome "aba" ? Doesnt it have to uncompress the string first , since the hash that has been precomputed is that of the compressed string ?
Using
h[i]=(h[i-1]*small_prime+s[i])%very_big_prime
reduces collisions?1)Yes you need to decompress.it first and calc hash for decompressed string. 2)Yes, it reduces collisions and you will pass many anti-hash tests
Got it!
So if my
So I compute Hash_of_P using
small_prime
andvery_big_prime
Then precompute prefix hashes for S in h[] using the same
small_prime
andvery_big_prime
And then check
(h[i+|P|] - h[i-1])*(prime^|P|) == Hash_of_P
?I think it wont work. It's better to create a string Q=P+'#'+S and find with first formula
What would be the reverse of this formula->
h[i]=(h[i-1]*small_prime+s[i])%very_big_prime
?P_hash: P[0] + P[1] * prime + P[2] * prime2 + P[3] * prime3 + ...
pre_hash[i+|P|-1]-pre_hash[i-1]: S[i] * primei + S[i + 1] * primei + 1 + ...
If we want to compare two strings we must have the same multipliers in its hashes. So we can simply multiply Hp by primei. Another way — we can multiply HS[i, i + |P|] by prime - i using multiplicative inverse of the prime. You should read more about Rabin-Karp algorithm for better understanding.