Today one of my myths was broken.
I used to believe that unordered_map is better in time-complexity than map in C++. But today while I was doing a problem(Molly's Chemicals), I got time-limit exceeded. After a lot of guess-work(because I thought my solution was correct), I tried using a map instead of an unordered_map and as a surprise I got it accepted. Then I realised that though the amortised time-complexity of using an unordered_map is O(1) while that of a map is O(log n), there are cases in which due to a lot of collisions unordered_map can have a big constant multiplier which will increase its actual complexity to greater than that of a map.
(Any corrections are welcomed...)
Can someone enlighten me more by clearing the doubt that where should unordered_map be used and where not?
What are the cases in which a map would perform better than an unordered_map?
Here are the tle and aced solution links if someone wants to verify:
TLE Solution(TLE after 2500 ms)
Accepted Solution(Accepted in 499 ms)
Firstly, unordered_map with default settings has a pretty large constant. It can be decreased a lot by calling reserve and max_load_factor methods as explained at the end of this blog. I think without this unordered_map is slightly faster than map, with this it should be much faster — assuming random input.
Secondly, in this problem test were used that make the c++ unordered_map implementation really slow. This is possible because the hash function for int/long long in C++ is identity (and then taking this modulo the number of buckets), so one can force every key in the hashmap to have the same hash, and then a single lookup becomes O(n).
To fix the second you can paste your own hash implementation like this: 24950229 (unfortunately didn't do this during the contest, got TLE just like you). This submission also xor's the hashmap key with a randomly drawn number so someone couldn't reverse-engineer your hash somehow and hack it.
Thanks, the link to the blog was useful :D
Actually identity only in GCC. I don't know how std::hash works in clang, but MSVC hashes byte by byte and safe from here.
Should we use these two lines always? Or need to change depending on any constraint or input size?
The first line means that you're reserving space upfront for 4096 elements — this would be a waste of time if your hashmap ends up being smaller than that. When my hashmap was expected to be much larger, I was still using 4096, although I never really thought about it deeply.
The
max_load_factor
of f roughly says that the maps could take times more memory, but should be faster. Hence, this should be only used if memory is not a problem, and 0.25 is a sensible default value. I sometimes use lower values like 0.1 when I'm really optimizing for time, but it seems decreasing f further gives diminishing returns.VS 2010 unordered_map very fast 24960324
unordered_map's amortized time complexity bound is not specified. Only average time complexity is said to be constant for search, insertion and removal. Source.
Note: if amortized bound would also be constant, the solution utilizing unordered_map would have passed.
Asymptotic complexity can still hide huge constants, so your note is wrong.
Based on practical evidence, the constant factor built into unordered_map is not that big. The problem is not in the constant factor, but in the fact that worst-case time complexity for a simple implementation of hashtable is O(N) for basic operations. There were times when programmers knew how hashtables are implemented, because they were implementing them on their own. At that times, it was well-known that hashtables may struggle from clustering causing them to perform as poor as linked lists.
That is frequently implicit when discussing asymptotic complexity. Succinct, not wrong :)
The point about amortized complexity is much more interesting, in my opinion: If you were to do n operations on a splay tree, for example, a single operation taking O(n) time wouldn't be worrisome because the full sequence of operations would still take O(n log n) time; you already "paid" for the costly operation before it happened.
Since I am already writing this post, I guess I should say that the expression "big constant multiplier" in the original post is slightly wrong: There could be a large asymptotic slowdown (from O(1) to O(n)) in the case of collisions, it's not just a constant multiplier.
the same thing has happened with my friend whose handle is hmrockstar
LMAIO, I learnt it today, in the same way as you did!
One more problem from where beginners can observe this 1915E - Romantic Glasses
Unordered_map : TLE Code
Map : Accepted Code
Just Learned it in hard way in my already going bad contest (-80 rating downgraded from pupil to newbie) on a very easy 800-900 rating question
SO THE EASY PROBLEM WITH UNORDERED_MAP TLE IS -> 1955B - Progressive Square
using map is better than using stock unordered_map in above problem
255950000
One more problem where i observe this issue 1899D - Yarik and Musical Notes
unordered_map : TLE CODE 262237860
map : CODE 262238306
Interesting video about hash tables: https://www.youtube.com/watch?v=DMQ_HcNSOAI
I am also face same problem in https://mirror.codeforces.com/problemset/problem/1980/C
TLE solution: https://mirror.codeforces.com/contest/1980/submission/270365556 Accepted Solution: https://mirror.codeforces.com/contest/1980/submission/270365621