Any help on this problem? (String Hashing)

→ Pay attention

Before contest
Codeforces Round 994 (Div. 2)
20:25:47
Register now »

*has extra registration

→ Streams

Codeforces Round 994 Solution Discussion

By aryanc403

Before stream 22:35:47

View all →

→ Top rated

#	User	Rating
1	tourist	3985
2	jiangly	3741
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3489
7	Radewoosh	3483
8	Kevin114514	3442
9	ecnerwala	3392
9	Um_nik	3392

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	cry	169
2	maomao90	162
2	Um_nik	162
2	atcoder_official	162
5	djm03178	158
6	-is-this-fft-	157
7	adamant	155
8	awoo	154
8	Dominater069	154
10	nor	150

View all →

→ Find user

→ Recent actions

Detailed →

PedroCastillo's blog

Any help on this problem? (String Hashing)

By PedroCastillo, history, 6 years ago, In English

Hi, I'm attempting this problem with string hashing.

It works well for small inputs but gives wrong answer on very large inputs.

Any help on what could be wrong?

Problem : https://mirror.codeforces.com/contest/271/problem/D

Submission: https://mirror.codeforces.com/contest/271/submission/46239564

PedroCastillo
6 years ago
8

Comments (8)

Write comment?

Volpe

6 years ago, # |

I think you just need to use double hashing to avoid collision .

→ Reply

PedroCastillo

6 years ago, # ^ |

How exactly?

Also, how can I tell I need to use double hashing? I mean, when is it necessary?

Furthermore, I've seen solutions to this problem with just one normal hashing :(

→ Reply

Volpe

6 years ago, # ^ |

I mean with double hahsing is to use two hash values for the string with two different base and MOD values .

In general you can't tell when will a single hash solution will pass the test cases for a problem as the collision happens with a probability and you can't tell if your solution will collide or not but you can reduce the probability of collision as much as you can .

You can calculate this probabilty by assuming that the hash values will be uniformly distrubted over the different values of strings so as much as you increase the value of the MOD you will gain more probability of getting ACC (less probability of collision) or by using double hashing for solutions based on rolling hash in your case .

→ Reply

PedroCastillo

6 years ago, # ^ |

← Rev. 2 →

Thanks, it worked. However, how could I tell I needed the double hashing before submitting?

→ Reply

Noam527

6 years ago, # ^ |

You don't need to detect when you should use 2 or more hashes. One could say you should do according to your intuition, but I suggest always using multiple hashes, depending on how memory and time consuming it is to build this many hashes. Say, 2 or 3 is the usual amount I use.

→ Reply

CodingKnight

6 years ago, # |

← Rev. 7 →

The following is an accepted solution based on collision-free substring hashing. The main idea is to enumerate small letters between a and z as integers between 0 and M - 1, where M = 26. Then, up to P consecutive symbols in the string are packed in a single integer as digits of a base-M integer using iterated multiplication and addition without overflow, and P = 13 for a 64-bit signed integer. The sequence of integers generated from packing a substring represents a collision-free hash key for all substrings with the same length. A two-dimensional array of hash-key sets is used to store the distinct keys generated from all substrings in the input string, where the first index represents the number of bad letters in the substring and the second index represents the length of the substring. It is guaranteed that two substrings are different if the number of bad letters they contain are different or their lengths are different. In other words, all substrings stored in one item of the two-dimensional array have the same number of bad letters and the same length.

46247924

UPDATE:

The following is an update for the previous solution using one-dimensional array to store the collision-free hash key (using the second index only of the previous solution, i.e. the substring length). This update improved both the execution time and memory used.

46257737

→ Reply

ILoveBitches

6 years ago, # |

← Rev. 3 →

Always Use double hashing if possible. The probability of collision in single hashing is N/MOD. While Using double hashing the probability of collision becomes (N*N/MOD*MOD1). In case of worst case, N/MOD might become 10e-4 which will lead you to trouble. Instead while using double hashing, In the worst case, the probability of collision will remain 10e-8 at least.

→ Reply

BledDest

6 years ago, # |

I think that the birthday paradox is a convenient way to measure this: if we generate something like $\text{[math]}$ random integers from 0 to MOD - 1, the probability of collision will be somewhere near 0.5. So if you want to make a lot of string comparisons using 32-bit hashing, the probability of collision is high (and it becomes even higher assuming there are multiple tests, and you should pass all of them).

Taking two (or three) 32-bit hashes or one (or two) 64-bit hash should be enough almost in every problem.

→ Reply