Hallowno4's blog

By Hallowno4, history, 2 months ago, In English

How are the system tests decided? I know that the pretests are very limited due to heavy traffic during a contest, but do system tests really cover all the plausible tricky cases? Now I know there always are hacks for such cases like in educational rounds or even in normal div-2 rounds where there is in contest hacking solely meant to capitalise on such cases, but in the last div 2 round (1087), a submission of one of my friend's got accepted in system testing even though his logic was missing what is arguably the central idea of the entire problem. Namely in problem 2209C, the logic was checking n-2 pairs with different elements and making a triangle between 3 elements out of the 4 unchecked elements in the worst case. A few people submitted checking n pairs and incorrectly checking (1,3) or some other pair and failing on test 9, but the user in question checked (n,n+1) or something like that and (outside of normal 2k,2k-1 pairs) and it passed system testing. It personally took me ~ 15 mins to think of this correction and implement it as I too initially checked just in 1,3 and got Wa on test 9. Like if they covered this flaw for selective pairs in pretests, shouldn't all cases with same flaw be checked before releasing the final standings and ratings?

  • Vote: I like it
  • +4
  • Vote: I do not like it

»
2 months ago, hide # |
 
Vote: I like it 0 Vote: I do not like it

Shantanu bacchu

»
2 months ago, hide # |
 
Vote: I like it 0 Vote: I do not like it

pretests are very limited

You are talking about cf (codeforces)?

»
2 months ago, hide # |
Rev. 2  
Vote: I like it 0 Vote: I do not like it

Tests are written by the person who prepares the problem, usually the author. They are always intended to be strong, but sometimes it's hard to think of all of the things that participants might do, which is why they are weak sometimes. The tests are always deterministic, so two submissions with exactly the same output should get the same verdict.

As for 2209C - Find the Zero, I also saw from various comments that this very natural fakesolve can pass tests. I think this is pretty bad, but I don't blame the authors since tests are really hard to write. Personally, I'm not very fond of the fact that most recent rounds seem to disable hacks, since it leads to issues like this.

Interestingly, I noticed that the interactor is adaptive. It reminds me of a similar case in 2155D - Batteries where an adaptive interactor allowed a pretty common bug to pass. I guess adaptive interactors make tests much harder to write, since you have to account for all of the different branches the interaction can take.