Hi!
I thought that unordered_set
is faster than set
, because set complexity is logarithmic in size and unordered_set
complexity is constant time in average.(source: cplusplus.com)
But today I saw this:
TLE(>2000 MS) with unordered_set
: 15494816
Accepted(155 MS) with set
: 15494846
Also my other solution(witch I submitted during the contest) with unordered_map
hacked :'(
Now my question is : Is unordered_set
faster than set
?
UPD: I have accepted(15497282).unordered_set
time was better than set
(15494846) : 93<155.
Only with adding this line:s.max_load_factor(0.25);s.reserve(500);
.
UPD2: it seems that it is better to use a power of 2 in reserve(i.e. s.reserve(512)
).(Thanks from keyvankhademi)
The default hash function for integers does not perform very well. I suggest trying to look up other hash function alternatives.
In my experience, both unordered set and map have worked well for strings.
Any hash function does not perform well unless it is chosen randomly from a big set.
there's a difference between functions that work badly when someone want them too and that just work badly.
unordered_set is not always faster than set, it's like comparing complexity between quick sort and merge sort.
to be fair,
quick sort
isO(n lg n)
in average andO(n^2)
in worth case,butmarge sort
isO(n lg n)
in average and worth case.But
unordered_set
isO(1)
in average andO(n)
in worth case, butset
isO(lg n)
in average and worth case.quick sort
andmarge sort
average is equal, so it is better to usemarge sort
.But because
unordered_set
average complexity is better thanset
, it seems thatunordered_set
must be better.adding pos.reserve(500); reduces running time better then tree set for the current tests: http://mirror.codeforces.com/contest/620/submission/15496964
i have no idea though why setting to a much higher value in place of 500 again increases time
Yeah, sorry for the bad intepretation. But my point still stands, unordered_set complexity is unstable so the matter of is unordered_set is faster than set depends on the test case.
Umm, no, it's better to use quicksort. Proof — this text from Skienna, Algorithm Design Manual:
Basically quicksort has better cache locality. Most C++ standard library implementations use a combination of quicksort and insertion sort as their implementation for
std::sort
.Well, that depends on the implementation of lib (And specially-crafted data will cause that. Check the UPD behind). Theoretically and most of the times, the hash ones are faster than tree ones.(And that's why hashing is quite popular in actual work.)
And as it turns out on Microsoft .Net Framework implements, the answer is yes.
15474905 the HashSet one with 218ms.
15496423 the SortedSet one with 311ms.
As for their implementation in Microsoft .Net Framework, it's already open-sourced. corefx/src/System.Collections/src/System/Collections/Generic/HashSet.cs corefx/src/System.Collections/src/System/Collections/Generic/SortedSet.cs
As for the GNU C++ inplementations(hash table implementations in <bits/hashtable.h>, and the default hash function for int is just static_cast(__val) ), I'm surprised to find a try-catch block in it(and not so easy to understand how it works).
And thank you for the heads up. I'll think twice before using unordered_set/unordered_map on GNU C++.
UPD:
The problem just reminds me of a famous security problem: Hash Collision DoS(a hot problem during 2011~2012), which, according to some reports, a lot of popular languages ware(maybe even are still) affected, including PHP<=5.3.8 , Perl, Java, ASP.NET, V8 JavaScript Engine,Ruby <= 1.8.7-p356 ,etc. A specially-crafted set of keys could trigger hash function collisions, which can degrade performance of HashMap or Hashtable by changing hash table operations complexity from an expected/average O(1) to the worst case O(n).
Some information here:https://bugzilla.redhat.com/show_bug.cgi?id=750533
(Some discussion shows that some language just adds some limits to the number of parameters to "hide" the problem inside the implements of hash table strategy (Source: http://stackoverflow.com/questions/8672985/how-was-the-hash-collision-issue-in-asp-net-fixed-ms11-100). You may want a try to hack my C# solution. )
Sometimes it is.
I have once got a problem AC by changing ordered_set to unordered_set in C++.
However it depends on the hashing. I always go with ordered set, and if it's tle and optimization, I try the unordered one.
Looks like someone went through the trouble to generate an extremely bad test case input for hacking 620C - Pearls in a Row. Now the hack is included in the system tests as Test 42. It causes a huge number of collisions in that specific G++ implementation of unordered map.
unordered_map
fails with TL: 15496737map
gets AC (only one line of code changed!): 15496747unordered_map
and custom hash function: AC again! 15497036. Bit slower compared to map, anyway.To the OP: good idea to change the max load factor! Looks like a simple solution compared to writing a custom hash function.
Tweaking the load factor + pre-creating many buckets (with
reserve
) solves the problem because you only get bad performance when there are many items per bucket. The worst-case is when all items are in a single bucket: then lookup in the hash map is reduced to list lookup.Yes I'm afraid that was me.
To sum it up, unordered_set is generally much faster than set. Set gurantees O(log n) but unordered_set is O(1) in average. However while set ensures O(log n), unordered_set can take O(n) time per insertion. This is very rare in practice unless the dataset is deliberately chosen so that unordered_set causes large number of collisions, degrading to O(n).