Блог пользователя saar2119

Автор saar2119, история, 8 лет назад, По-английски

Today one of my myths was broken.
I used to believe that unordered_map is better in time-complexity than map in C++. But today while I was doing a problem(Molly's Chemicals), I got time-limit exceeded. After a lot of guess-work(because I thought my solution was correct), I tried using a map instead of an unordered_map and as a surprise I got it accepted. Then I realised that though the amortised time-complexity of using an unordered_map is O(1) while that of a map is O(log n), there are cases in which due to a lot of collisions unordered_map can have a big constant multiplier which will increase its actual complexity to greater than that of a map.
(Any corrections are welcomed...)

Can someone enlighten me more by clearing the doubt that where should unordered_map be used and where not?
What are the cases in which a map would perform better than an unordered_map?

Here are the tle and aced solution links if someone wants to verify:
TLE Solution(TLE after 2500 ms)
Accepted Solution(Accepted in 499 ms)

  • Проголосовать: нравится
  • +9
  • Проголосовать: не нравится

»
8 лет назад, # |
  Проголосовать: нравится +23 Проголосовать: не нравится

Firstly, unordered_map with default settings has a pretty large constant. It can be decreased a lot by calling reserve and max_load_factor methods as explained at the end of this blog. I think without this unordered_map is slightly faster than map, with this it should be much faster — assuming random input.

Secondly, in this problem test were used that make the c++ unordered_map implementation really slow. This is possible because the hash function for int/long long in C++ is identity (and then taking this modulo the number of buckets), so one can force every key in the hashmap to have the same hash, and then a single lookup becomes O(n).

To fix the second you can paste your own hash implementation like this: 24950229 (unfortunately didn't do this during the contest, got TLE just like you). This submission also xor's the hashmap key with a randomly drawn number so someone couldn't reverse-engineer your hash somehow and hack it.

  • »
    »
    8 лет назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    Thanks, the link to the blog was useful :D

  • »
    »
    8 лет назад, # ^ |
    Rev. 3   Проголосовать: нравится 0 Проголосовать: не нравится

    Actually identity only in GCC. I don't know how std::hash works in clang, but MSVC hashes byte by byte and safe from here.

  • »
    »
    6 лет назад, # ^ |
      Проголосовать: нравится +10 Проголосовать: не нравится

    Should we use these two lines always? Or need to change depending on any constraint or input size?

    reserve(4096); // always 4096 ?? 
    max_load_factor(0.25); always 0.25 ??
    
    • »
      »
      »
      6 лет назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      The first line means that you're reserving space upfront for 4096 elements — this would be a waste of time if your hashmap ends up being smaller than that. When my hashmap was expected to be much larger, I was still using 4096, although I never really thought about it deeply.

      The max_load_factor of f roughly says that the maps could take times more memory, but should be faster. Hence, this should be only used if memory is not a problem, and 0.25 is a sensible default value. I sometimes use lower values like 0.1 when I'm really optimizing for time, but it seems decreasing f further gives diminishing returns.

»
8 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

VS 2010 unordered_map very fast 24960324

»
8 лет назад, # |
  Проголосовать: нравится +9 Проголосовать: не нравится

unordered_map's amortized time complexity bound is not specified. Only average time complexity is said to be constant for search, insertion and removal. Source.

Note: if amortized bound would also be constant, the solution utilizing unordered_map would have passed.

  • »
    »
    8 лет назад, # ^ |
    Rev. 2   Проголосовать: нравится -8 Проголосовать: не нравится

    Asymptotic complexity can still hide huge constants, so your note is wrong.

    • »
      »
      »
      8 лет назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      Based on practical evidence, the constant factor built into unordered_map is not that big. The problem is not in the constant factor, but in the fact that worst-case time complexity for a simple implementation of hashtable is O(N) for basic operations. There were times when programmers knew how hashtables are implemented, because they were implementing them on their own. At that times, it was well-known that hashtables may struggle from clustering causing them to perform as poor as linked lists.

    • »
      »
      »
      8 лет назад, # ^ |
        Проголосовать: нравится +4 Проголосовать: не нравится

      That is frequently implicit when discussing asymptotic complexity. Succinct, not wrong :)

      The point about amortized complexity is much more interesting, in my opinion: If you were to do n operations on a splay tree, for example, a single operation taking O(n) time wouldn't be worrisome because the full sequence of operations would still take O(n log n) time; you already "paid" for the costly operation before it happened.

      Since I am already writing this post, I guess I should say that the expression "big constant multiplier" in the original post is slightly wrong: There could be a large asymptotic slowdown (from O(1) to O(n)) in the case of collisions, it's not just a constant multiplier.

»
8 лет назад, # |
Rev. 17   Проголосовать: нравится 0 Проголосовать: не нравится

the same thing has happened with my friend whose handle is hmrockstar

»
10 месяцев назад, # |
Rev. 3   Проголосовать: нравится 0 Проголосовать: не нравится

One more problem from where beginners can observe this 1915E - Romantic Glasses

Unordered_map : TLE Code

Map : Accepted Code

»
8 месяцев назад, # |
Rev. 3   Проголосовать: нравится 0 Проголосовать: не нравится

Just Learned it in hard way in my already going bad contest (-80 rating downgraded from pupil to newbie) on a very easy 800-900 rating question

SO THE EASY PROBLEM WITH UNORDERED_MAP TLE IS -> 1955B - Progressive Square

using map is better than using stock unordered_map in above problem

255950000

»
7 месяцев назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

One more problem where i observe this issue 1899D - Yarik and Musical Notes

unordered_map : TLE CODE 262237860

map : CODE 262238306

»
7 месяцев назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Interesting video about hash tables: https://www.youtube.com/watch?v=DMQ_HcNSOAI