Блог пользователя vinayaka

Автор vinayaka, история, 13 месяцев назад, По-английски

Hi,

My knowledge on String algorithms is poor so I am studying it now.

I solved 271D - Good Substrings using a Trie, but I am trying to solve it using hashing.

I am able to calculate a polynomial rolling hash and compare substrings based on the hash, but my solution is getting WA on test 8 (230797737). I am thinking this is because of collisions, since the expected answer is also pretty high.

I read a topic on double polynomial rolling hash to reduce probability of collisions. I understand that if we use two hashes of order 10^9 then probability will be less because it is equivalent to using a 10^18 modulo.

I am unable to understand how to implement it.

Can anyone help?

Please point me towards an article/blog/submission that has code on implementing it.

Thank you.

  • Проголосовать: нравится
  • +7
  • Проголосовать: не нравится

»
13 месяцев назад, # |
  Проголосовать: нравится +3 Проголосовать: не нравится

Just hash it with two different primes. You can check my submission: https://mirror.codeforces.com/contest/271/submission/230803346

  • »
    »
    13 месяцев назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    I was confused on how to use the two hashes, now I understand we need to store them in a set, and they both combined point to a substring, this was the knowledge gap! Thank you!

    And..what does this code mean?

    struct hash_pair {
        template <class T1, class T2>
        size_t operator()(const pair<T1, T2>& p) const
        {
            auto hash1 = hash<T1>{}(p.first);
            auto hash2 = hash<T2>{}(p.second);
     
            if (hash1 != hash2) {
                return hash1 ^ hash2;              
            }
             
            // If hash1 == hash2, their XOR is zero.
              return hash1;
        }
    };
    
    unordered_set<pair<int,int>, hash_pair> countSet;
    
    • »
      »
      »
      13 месяцев назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      Earlier you were hashing with one prime and storing in hash set. Now you are hashing with two different primes so you will have to store it in hash set of pair. C++ does not have built in hash for pair, this part is just hashing the pair

»
13 месяцев назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Using two different bases works as well, which has the added benefit that you can randomize them at runtime to avoid getting hacked.

»
13 месяцев назад, # |
Rev. 2   Проголосовать: нравится +1 Проголосовать: не нравится

You can just avoid modulo operator entirely. Instead, allow them to overflow, which is essentially behaves like modulo $$$2^{64}$$$ (if you use 64-bit integers).