Kyuubi's blog

By Kyuubi, 10 years ago, In English

Given a string S of length n characters, is it possible to calculate the Hash of its substring [i, j] (From index i to index j. Inclusive) in O(1) using some form of precomputation ? Maybe a modification of the Rolling Hash ?

Similar Problem

Problem Statement

I have seen it being used in a similar problem where in a string was given in a compressed form. Meaning, e.g. if the string is "aaabccdeeee" then the compressed form is:

3 a
1 b
2 c
1 d
4 e

How data was stored

They are stored in an str[] array as :

str[] = [{'a','3'}, {'b','1'}, {'c','2'}....]

HASHING Concept that was used in the solutions

And programmers had used the following hash concept to find if the given substring is a Palindrome or not. Given a substring of string S as (i,j), they computed the hash of substring [i , (i+j)/2] and the reverse hash of substring [(i+j+2)/2, j] and checked if they were equal or not. So if they wanted to check if in string S = "daabac" whether substring [1, 5] is a a palindrome or not, they computed the following :

h1 = forward_hash("aa") 
h2 = reverse_hash("ba")
h1 == h2

Code for the Hashing Concept

The hash precomputation was done as follows :

/* Computing the Prefix Hash Table */
pre_hash[0] = 0;
for(int i=1;i<=len(str);i++)
{
    pre_hash[i] = pre_hash[i-1]*very_large_prime + str[i].first;
    pre_hash[i] = pre_hash[i]*very_large_prime + str[i].second;
}

/* Computing the Suffix Hash Table */
suff_hash[0] = 0;
for(int i=1;i<=len(str);i++)
{
    suff_hash[i] = suff_hash[i-1]*very_large_prime + str[len(str)-i+1].first;
    suff_hash[i] = suff_hash[i]*very_large_prime + str[len(str)-i+1].second;
}

And then the hash was computed using the following functions :

/* Calculates the Forward hash of substring [i,j] */
unsigned long long CalculateHash(int i, int j)
{
    if(i>j)
        return -1;
    unsigned long long ret = pre_hash[j] - POW(very_large_prime, [2*(j-i+1)])*pre_hash[i-1];
    return ret;
}
/* Calculates the reverse hash of substring [i,j] */
unsigned long long CalculateHash_Reverse(int i, int j)
{
    unsigned long long ret = suff_hash[j] - POW(very_large_prime,[2*(j-i+1)])*suff_hash[i-1];
    return ret;
}

What I am trying to do

I am looking for a general approach to the above concept. Given a Pattern P, I want to check if the pattern P is present in a string S. I know the index (i) to check where it may be present. And I also know the length of pattern P represented as |P|. In short I want to check if hash of S[i, i+|P|] and hash of P match or not in O(1) using some form of pre computation on S. Is it possible ?

Ignoring the time taken to compute hash of P else it would be O(1+|P|)

  • Vote: I like it
  • +3
  • Vote: I do not like it

»
10 years ago, # |
  Vote: I like it +3 Vote: I do not like it

If s[i]...s[j]=s[p]...s[q] then (h[j]-h[i-1])*(prime^p)==(h[q]-h[p-1])*(prime^i) where h is array of prefix hashes. BTW its better to calc h array using this formula h[i]=(h[i-1]*small_prime+s[i])%very_big_prime

  • »
    »
    10 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    The pattern is not part of the string. It is a separate string. Also the format of the string is not in compressed form as is given in the similar question. The string I am using is a general uncompressed string.

    This reminds me, the hash being computed in the above approach (my post) is computed on the compressed string. So how is it checking for a substring e.g.

    2 a
    1 b
    1 a
    

    How does it check for the substring palindrome "aba" ? Doesnt it have to uncompress the string first , since the hash that has been precomputed is that of the compressed string ?

    Using h[i]=(h[i-1]*small_prime+s[i])%very_big_prime reduces collisions?

    • »
      »
      »
      10 years ago, # ^ |
        Vote: I like it +3 Vote: I do not like it

      1)Yes you need to decompress.it first and calc hash for decompressed string. 2)Yes, it reduces collisions and you will pass many anti-hash tests

      • »
        »
        »
        »
        10 years ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Got it!

        So if my

        P : Pattern String 
        S : String to Search in.
        

        So I compute Hash_of_P using small_prime and very_big_prime

        Then precompute prefix hashes for S in h[] using the same small_prime and very_big_prime

        And then check (h[i+|P|] - h[i-1])*(prime^|P|) == Hash_of_P ?

        • »
          »
          »
          »
          »
          10 years ago, # ^ |
            Vote: I like it 0 Vote: I do not like it

          I think it wont work. It's better to create a string Q=P+'#'+S and find with first formula

  • »
    »
    9 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    What would be the reverse of this formula-> h[i]=(h[i-1]*small_prime+s[i])%very_big_prime ?

»
10 years ago, # |
  Vote: I like it 0 Vote: I do not like it

P_hash: P[0] + P[1] * prime + P[2] * prime2 + P[3] * prime3 + ...

pre_hash[i+|P|-1]-pre_hash[i-1]: S[i] * primei + S[i + 1] * primei + 1 + ...

If we want to compare two strings we must have the same multipliers in its hashes. So we can simply multiply Hp by primei. Another way — we can multiply HS[i, i + |P|] by prime - i using multiplicative inverse of the prime. You should read more about Rabin-Karp algorithm for better understanding.