Help with Python - Codeforces

My issue concerns 1840D - Wooden Toy Festival. Within the contest I used an approach similar to the one given by the authors of the problem in tutorial, and the first step was to sort an input array and don’t consider duplicates. As usual, I used a combination sorted(set(a)), which always worked well and seemed to me quite reliable. The submission 208933459 successfully passed all pretests. However, then I got a hack with TL exceed. After the contest I submitted an almost identical submission 208932147 with the only one exception: I replaced the combination above with the following block:

a.sort()
b = [a[0]]
for i in a:
	if i != b[-1]:
		b.append(i)

As I understand it, it performs no better. The first step of sorting the array works with $$$O(n \log(n))$$$ complexity, and the second one of removing duplicates with $$$O(n)$$$. However, with the such replacement my solution passed all the tests. Could you explain why this is so? And does it mean that henceforth it’s better to use something like the block above instead of using combination sorted(set(a)))? I would appreciate a lot!

Комментарии (8)

Написать комментарий?

denilb

3 года назад, скрыть # |

As a python user, my advice is to avoid using set() in CP. It doesn't perform well on large datasets

→ Ответить

drugkeeper

3 года назад, скрыть # ^ |

Then what do you suggest? Sometimes we have no choice but to use a hashset structure?

When O(1) lookup is required, I use a dict, never got an issue with it. defaultdict performs bad as well, I use dict.get(x, y) instead.

kraut

← Rev. 2 →

Your code is working fine in python3 (287 ms). Its only giving tle for pypy3. Eager to see if someone can tell why this is so :)

pypy3 and python use different hashing algorithms for set(), which is why the testcase engineered to kill pypy3 set() solutions will only work on pypy3.

This is due to the input being crafted to make python set() have collisions which will make the code run in n^2 time for set().

I fixed it by shuffling the array before i do sorted(set(a)): https://mirror.codeforces.com/contest/1840/submission/208991870

This would likely not happen in div1 / div2 rounds, only where rounds with a lot of time to hack (12h open hacking phase), people would try to hash collide. I still feel that this is quite dumb as this will be added to the system tests?

To the rest reading this, what are your opinions?

Nice!

rmr

It was indeed very informative, thanks a lot! But I didn't quite grasp an idea about how shuffling actually affects this situation? Since Python set() is implemented with using hash-table, lookups work with O(1) in average, but when it's up to collisions, according to Fluent python, its handling works somewhat like an Open addressing method, doesn't it? And does it make any difference in which order to get the hashes if it leads to collisions anyway?

№	Пользователь	Рейтинг
1	Benq	3792
2	VivaciousAubergine	3647
3	Kevin114514	3603
4	jiangly	3583
5	turmax	3559
6	tourist	3541
7	strapple	3515
8	ksun48	3461
9	dXqwq	3436
10	Otomachi_Una	3413

№	Пользователь	Вклад
1	Qingyu	157
2	adamant	153
3	Um_nik	147
4	Proof_by_QED	146
5	Dominater069	145
6	errorgorn	141
7	cry	139
8	YuukiS	135
9	TheScrasse	134
10	chromate00	133

Блог пользователя rmr