So.... System Testing????

How are the system tests decided? I know that the pretests are very limited due to heavy traffic during a contest, but do system tests really cover all the plausible tricky cases? Now I know there always are hacks for such cases like in educational rounds or even in normal div-2 rounds where there is in contest hacking solely meant to capitalise on such cases, but in the last div 2 round (1087), a submission of one of my friend's got accepted in system testing even though his logic was missing what is arguably the central idea of the entire problem. Namely in problem 2209C, the logic was checking n-2 pairs with different elements and making a triangle between 3 elements out of the 4 unchecked elements in the worst case. A few people submitted checking n pairs and incorrectly checking (1,3) or some other pair and failing on test 9, but the user in question checked (n,n+1) or something like that and (outside of normal 2k,2k-1 pairs) and it passed system testing. It personally took me ~ 15 mins to think of this correction and implement it as I too initially checked just in 1,3 and got Wa on test 9. Like if they covered this flaw for selective pairs in pretests, shouldn't all cases with same flaw be checked before releasing the final standings and ratings?

Comments (4)

Write comment?

tourist_gpt_175

2 months ago, hide # |

Shantanu bacchu

→ Reply

oversolver

pretests are very limited

You are talking about cf (codeforces)?

Hallowno4

2 months ago, hide # ^ |

yeaaa.. i get why the pretests got passed but he survived system testing as well...

reirugan

← Rev. 2 →

Tests are written by the person who prepares the problem, usually the author. They are always intended to be strong, but sometimes it's hard to think of all of the things that participants might do, which is why they are weak sometimes. The tests are always deterministic, so two submissions with exactly the same output should get the same verdict.

As for 2209C - Find the Zero, I also saw from various comments that this very natural fakesolve can pass tests. I think this is pretty bad, but I don't blame the authors since tests are really hard to write. Personally, I'm not very fond of the fact that most recent rounds seem to disable hacks, since it leads to issues like this.

Interestingly, I noticed that the interactor is adaptive. It reminds me of a similar case in 2155D - Batteries where an adaptive interactor allowed a pretty common bug to pass. I guess adaptive interactors make tests much harder to write, since you have to account for all of the different branches the interaction can take.

#	User	Rating
1	Benq	3792
2	VivaciousAubergine	3647
3	Kevin114514	3611
4	jiangly	3583
5	strapple	3515
6	tourist	3470
7	Radewoosh	3415
8	Um_nik	3376
9	maroonrk	3361
10	XVIII	3345

#	User	Contrib.
1	Qingyu	162
2	adamant	148
3	Um_nik	146
4	Dominater069	143
5	errorgorn	141
6	cry	138
7	Proof_by_QED	136
8	YuukiS	135
9	chromate00	134
10	soullless	133

Hallowno4's blog