AnotherRound's blog

By AnotherRound, history, 9 years ago, In English

After getting TLE on test 132 for problem C on CF #350, I tried to optimise. I was surprised when I got AC, because the only thing I needed to change was to replace std::unordered_map with std::map. Does anyone have ideas what causes the different performance? I expected that unordered_map should be faster that map, but it is not the case. And also, is there a way to speed up unordered_map? Here are both submissions:

AC — map: http://mirror.codeforces.com/contest/670/submission/17757996

TLE — unordered_map: http://mirror.codeforces.com/contest/670/submission/17730894

  • Vote: I like it
  • +32
  • Vote: I do not like it

| Write comment?
»
9 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by AnotherRound (previous revision, new revision, compare).

»
9 years ago, # |
  Vote: I like it +10 Vote: I do not like it

It's anti-hash-map test. Every hash-map implementation fails on certain anti-hash-map tests. The only way to prevent it is to use your own randomized implementation.

»
9 years ago, # |
Rev. 3   Vote: I like it +33 Vote: I do not like it

Apparently simple reserve() helps to avoid TLE (I also got TLE and felt very sad):

http://mirror.codeforces.com/contest/670/submission/17728217 — TLE 132 (unordered_map)

http://mirror.codeforces.com/contest/670/submission/17758293 — AC (still unordered_map)

According to reserve() documentation, it rehashes the hash table for better bucket distribution.

  • »
    »
    9 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Another way I found is to xor values with some (random) constant when you access the hashmap.

»
9 years ago, # |
Rev. 2   Vote: I like it +26 Vote: I do not like it

During one educational round, I was pretty sure my solution for D will pass the system tests since it was not that hard — it was a combination of coordinate compression and fenwick tree. I wanted to submit it fast so I just copied my previous codes for the compression and the tree and it passed the pretests. Then, during the hacking phase, I saw halyavin had hacked my solution with TLE. After looking at my code for a while I saw that my coordinate_compression struct uses unordered_map and I asked him if he somehow managed to make it slow and here is what he said:

Yes, you are right. unordered_set and unordered_map are O(n^2) is the worst case. I just needed to put a lot of eggs elements in the single bucket. It is not the first contest I (and others) use an anti-hash test. There were quite a lot of discussion on codeforces about them too. Well, now you know about them the hard way.

So maybe you don't want to use unordered_map/unordered_set anymore :D

Here are my code and his test generator — http://mirror.codeforces.com/contest/652/submission/16923953, http://mirror.codeforces.com/contest/652/hacks/222128/test :)

  • »
    »
    9 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I tried dreamzor's suggestion and it works great. But at least in Bulgarian competitions, I'll make sure to upload versions with both map and unordered_map :D

  • »
    »
    8 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Hi, Sorry for bringing this up a bit delayed, but would you mind explaining how we know the buckets of the unordered_map already and in conjunction how the test then puts all the elements into that bucket? Thanks!

    • »
      »
      »
      8 years ago, # ^ |
      Rev. 2   Vote: I like it 0 Vote: I do not like it

      I don't really know, maybe someone who knows can explain, I am also interested in this :)

      • »
        »
        »
        »
        8 years ago, # ^ |
          Vote: I like it +5 Vote: I do not like it

        Haha okay, maybe someone will see it now and help us, otherwise I might post a blog asking for help

        • »
          »
          »
          »
          »
          8 years ago, # ^ |
            Vote: I like it +18 Vote: I do not like it

          There was an IPSC problem (H) in 2014 that asked to create a test case that breaks both GCC and OpenJDK implementations of hashset. You may read the solution here

          • »
            »
            »
            »
            »
            »
            8 years ago, # ^ |
              Vote: I like it 0 Vote: I do not like it

            This is really interesting reading, thank you!

»
9 years ago, # |
  Vote: I like it +15 Vote: I do not like it

I also failed on the test but figured out a nice way to circumvent the anti-unordered_map test. You can use this technique if you really need the unordered_map, because it seems to be a tad faster than map in some cases. The trick is to "encrypt" the unordered_map input with xor, you choose some constant e, and every time you need to use the unordered_map you do unordered_map[input^e]. If you choose the constant randomly it will pass systests, but will be hackable if the hacker is not lazy. code here: http://mirror.codeforces.com/contest/670/submission/17757763

  • »
    »
    9 years ago, # ^ |
      Vote: I like it +8 Vote: I do not like it

    Thanks for the advice! It is the shortest and most elegant solution I've seen for that issue ever.

    If the constant is chosen in runtime (for example, with rand() with previous seed modification) is it possible to design a killer test for any constant value?

    • »
      »
      »
      9 years ago, # ^ |
        Vote: I like it +5 Vote: I do not like it

      It is possible to design a killer test for any constant value, but you can't design a killer test for all constant values :). So choosing the constant at runtime with rand() and proper seeding should be unhackable and pass the systests.