How to hash a substring efficiently? - Codeforces

→ Pay attention

Before contest
CodeTON Round 9 (Div. 1 + Div. 2, Rated, Prizes!)
4 days
Register now »

*has extra registration

→ Streams

DP Bitmask - Topic Stream

By Shayan

Stream is running

View all →

→ Top rated

#	User	Rating
1	tourist	4009
2	jiangly	3823
3	Benq	3738
4	Radewoosh	3633
5	jqdai0815	3620
6	orzdevinwang	3529
7	ecnerwala	3446
8	Um_nik	3396
9	ksun48	3390
10	gamegame	3386

Countries | Cities | Organizations

→ Top contributors

#	User	Contrib.
1	cry	166
2	maomao90	163
2	Um_nik	163
4	atcoder_official	161
5	adamant	159
6	-is-this-fft-	158
7	awoo	157
8	TheScrasse	154
9	nor	153
9	Dominater069	153

View all →

→ Find user

→ Recent actions

Detailed →

gautam94's blog

How to hash a substring efficiently?

By gautam94, 11 years ago, In English

In English

I was reading about hashing from here and I am unable to understand the part about calculation of hash of a substring. I am calculating the hash of the entire input string in this way : h (S) = S [0] + S [1] * P + S [2] * P ^ 2 + S [3] * P ^ 3 + ... + S [N] * P ^ N

Suppose P = 31 and a = 1, b = 2, c = 3 and so on. Then for the input string abcdab, h[0] = 1, h[1] = 32, h[2] = 2915, h[3] = 122079, h[4] = 1045600, h[5] = 58303902. Now from these values, how can I calculate h[0..1] or h[3..5]?

Tags

hashing, rabin- karp algo, roll hash

+6

gautam94
11 years ago
25

Comments

Comments (22)

Show archived | Write comment?

»

11 years ago, # |

← Rev. 3 →

Vote: I like it

+6

Vote: I do not like it

Let's store suffix hashes instead.

We can now calculate hashes like that: h[i] = h[i + 1] * p + s[i]

So, we want to find h[l..r]. Let len = r - l + 1

Let's write h[l] and h[r + 1]:

h[l] = s[l] * p^0 + s[l + 1] * p^1 + s[l + 2] * p^2 + ... + s[r] * p^(len - 1) + s[r + 1] * p^len + ...
h[r + 1] = s[r + 1] * p^0 + s[r + 2] * p^1 + ...
h[r + 1] * p^len = s[r + l] * p^len + s[r + 2] * p^(len+1) + ...
h[l] - h[r + 1] * p^len = s[l] * p^0 + s[l + 1] * p^1 + s[l + 2] * p^2 + ... + s[r] * p^(len - 1)

That's exactly what we need.

→ Reply

»

»

11 years ago, # ^ |

Vote: I like it

0

Vote: I do not like it

Thanks for replying. Can you also tell me how can I use this for longer strings, say len = 10000. P^len will not fit in 64-bit int and I read this which suggests that we shouldn't do calculations mod 2^64. So, what mod should I use and how can I calculate hash of a substring when doing calculations mod some number?

→ Reply

»

»

»

11 years ago, # ^ |

Vote: I like it

+6

Vote: I do not like it

You should always use a prime modulo, for example, 10^9 + 7 or 10^9 + 9. You can do all the operations by that modulo. Just calculate powers of p by that modulo, all calculations will be fine.

BTW, when you substract 2 numbers by modulo, you can get a negative number in result. If you use just a % b, in many programming languages it would be negative. For example you can use cout << -3 % 2;, it'll output -1, but you would expect 1. So, the best way to prevent negative numbers is using (a % b + b) % b. For example a = -3, b = 2: (-3 % 2 + 2) % 2 = (-1 + 2) % 2 = 1 % 2 = 1

→ Reply

»

»

»

»

10 years ago, # ^ |

Vote: I like it

0

Vote: I do not like it

can you explain where i have to use (a % b + b) % b in this code http://mirror.codeforces.com/contest/182/submission/6734689 i think my error is because of this.

→ Reply

»

»

»

»

»

10 years ago, # ^ |

← Rev. 2 →

Vote: I like it

0

Vote: I do not like it

So, we want to know, what is remainder if we divide a by b. As you know, mathematical remainder always belongs to the range of [0, b)

So, if we write a % b, we expect to get the remainder. If a >= 0 we'll indeed get the remainder. But if a < 0 we'll not get the remainder (in most languages: C++, Java, but not Python). But actually, if we add b to this, we'll get the remainder (it will be in range (-b, 0]).

For example:

a % b = [what we get in C++] (what the remainder actually is)
-4 % 5 = -4 (1)
-153 % 42 = -27 (15)
-89 % 2 = -1 (1)
-88 % 2 = 0 (0)

So, if we write a % b + b, we'll always get a positive integer. In case of negative a, it will be the remainder. In case of a non-negative a, it will be r + b. To get a true remainer, we'll write (a % b + b) % b. This will work for all positive values of a.

→ Reply

»

»

»

»

»

»

10 years ago, # ^ |

Vote: I like it

0

Vote: I do not like it

hey thanks a lot for replying. i understood your comment but can you please tell the reason the code posted in this comment http://mirror.codeforces.com/blog/entry/12145#comment-171463 is giving negative hashes for some substrings. thanks.

→ Reply

»

»

»

»

»

»

»

10 years ago, # ^ |

Vote: I like it

0

Vote: I do not like it

someone please answer this. i have been working on this problem since a long time..is the hashing technique correct?

→ Reply

»

»

»

»

»

»

»

»

10 years ago, # ^ |

Vote: I like it

0

Vote: I do not like it

You can use unsigned type , and forget about negative

here my example

http://community.topcoder.com/stat?c=problem_solution&rm=320487&rd=15840&pm=12967&cr=22853961

→ Reply

»

»

»

10 years ago, # ^ |

Vote: I like it

0

Vote: I do not like it

You can also use a pair of hashes, both of which are computed modulo a different prime. That way you get much better collision resistance and you don't need to multiply 64-bit numbers.

→ Reply

»

»

»

»

10 years ago, # ^ |

Vote: I like it

0

Vote: I do not like it

can you please tell wy i get negative value for some hashes? http://ideone.com/OgspSn dfferent base and MODs give different results.some give negative values some dont. why is it so and how can i get rid of this? this one's same as previous code but with different MOD http://ideone.com/HWtRxk but even this gives WA 3.

→ Reply

»

»

»

»

»

10 years ago, # ^ |

Vote: I like it

0

Vote: I do not like it

What? No, I'm not going to debug your code. But chances are good that you have an overflow in there.

→ Reply

»

»

»

»

»

10 years ago, # ^ |

Vote: I like it

0

Vote: I do not like it

#comment-171465

Try writing separate function for calculating x % MOD

→ Reply

»

»

11 years ago, # ^ |

Vote: I like it

0

Vote: I do not like it

suppose we have this:

h[k] means hash of substring [k, N]

can we get hash(l, r) in O(1)?

→ Reply

»

»

»

11 years ago, # ^ |

Vote: I like it

+5

Vote: I do not like it

Of course! o.0

Just store powers of p! Now the result will be h[l] - h[r + 1] * ppow[r - l + 1]

→ Reply

»

»

»

»

11 years ago, # ^ |

Vote: I like it

+3

Vote: I do not like it

thanks.

→ Reply

»

»

»

11 years ago, # ^ |

Vote: I like it

+3

Vote: I do not like it

Here's the implementation in CPP.

→ Reply

»

»

»

»

11 years ago, # ^ |

Vote: I like it

+3

Vote: I do not like it

Anti-hash test is coming for you, so don't forget to make all the calculations by some modulo!

Here's my implementation:

struct SingleHash {
    vector<int> suf, b;
    int mod;

    SingleHash(string s, int base = 153, int _mod = 1000000009) {
        int n = s.length();
        suf.assign(n + 1, 0); // suf[n] = 0
        b.assign(n + 1, 0);
        b[0] = 1;
        b[1] = base;
        mod = _mod;
        for (int i = n - 1; i >= 0; --i) {
            suf[i] = (s[i] + (ll)suf[i + 1] * b[1]) % mod;
        }
        for (int i = 2; i <= n; ++i) {
            b[i] = (ll)b[i - 1] * b[1] % mod;
        }
    }

    int substr(int l, int r) const { // [l, r]
        ll v = suf[l] - (ll)suf[r + 1] * b[r - l + 1];
        v %= mod;
        v += mod;
        v %= mod;
        return v;
    }
};

→ Reply

»

»

»

»

»

11 years ago, # ^ |

Vote: I like it

0

Vote: I do not like it

I think this one's correct too. Anyways, thanks a lot for our help. Can you also suggest a few problems that can be solved by hashing?

→ Reply

»

»

»

»

»

»

11 years ago, # ^ |

Vote: I like it

0

Vote: I do not like it

the second one (with #define BASE 163) is the correct way to do it.
having said that, ur first code (despite not having MOD) was "right" because $\text{[math]}$ . ur second code wouldn't work without MOD as $\text{[math]}$ .

→ Reply

»

»

10 years ago, # ^ |

Vote: I like it

0

Vote: I do not like it

what if h[l] lesser than h[r+1] * p^len?? i will obviously get negative value.

→ Reply

»

»

»

10 years ago, # ^ |

← Rev. 2 →

Vote: I like it

0

Vote: I do not like it

I've already wrote about it, do you support reading the text?

→ Reply

»

»

6 years ago, # ^ |

Vote: I like it

0

Vote: I do not like it

Great method !

→ Reply