- When someone successfully hack another participant, his/her test join set of final tests.
Sometimes it can be useful and fair, because sometimes set of final tests is not covering some cases where at least one participant made a mistake.
For example: At #260 Div.2 A there were incorrect submission — 7385198, which passed all system tests, but it could not pass elementary test
3
1 2
2 3
3 1
so if someone successfully hacks using this test, this test should go to set of final tests.
Of course then the number of final tests can grow up to number of all participants, and this would slow down testing.
I think this is a good idea. TopCoder does it already, and while it might slow down testing, there are already ~50 test cases per problem. Last contest there were 79 successful hacks (http://mirror.codeforces.com/blog/entry/13340) so my (very rough) estimate is a 20-40% increase in system test time. I think that's worth it -- if I hack someone, I don't want to feel like I'm actually hurting their performance. Likewise, if I get hacked, it should be a feeling of "I'm glad I can fix that before systests", not "I wish that guy were in a different room".
this test looks like my test :))
I agree that it's a good idea. The only problem is that it slows down the systest (large hacks slow it down more significantly) and it could get out of hand if there are too many. It'd be good to keep the number of hacktests limited based on what verdict they gave (TLE, huge time WA should have smaller priority) and how large they are, and also put an upper limit on them in case MemSQL Start[c]UP 2.0 - Round 1 with 400+ hacks, which was more than the total number of tests in all problems, happens.
I think limiting them is wonderful idea but I'm not sure whether TLE tests having smaller priority is a great idea. Sometimes solutions with complexities like O(nm), n,m<=10^5 pass simply because of some pruning or optimizations (I'm pretty sure I read such comment in #260), so TLE tests may be important too.
I'm not saying they aren't important, but they're not really worth making the systest last 3 hours instead of one (or 15 minutes, as it sometimes does). I don't think hacks tend to be on maxtests that aren't obvious enough for the authors to make, anyway, simply due to the time it should take to make them. From that point of view, using 50 more smaller hacktests is a smarter idea than using 50 more large ones.
This all is hard to balance, anyway... I'm just throwing some rough ideas.
This is a really good idea. Another possibility to limit hacktests would be checking if that incorrect solution would not have been caught by normal systest before adding.
I think that'd be automatic by adding the hacktests to be tested after normal ones.
Not really -- in that case all AC solutions would be tested on the hacktests no matter what.
Oh okay, I misunderstood your comment.
I believe Gerald or Mike said that they add hacks of solutions which would got AC each time.
Sure, also we add many successful hacks (not all of them because of often there are too many).
Is this process automated? Or it is jury's decision to add or not to add some test?
it is jury's decision.
Thank you, I didn't know that.
Hey :)
Every contest I am going through all the hacks to find some creative ones. If they exist, I add them to the final testset. Additionally, we have a feature that gives us advices like: "This hack is needed to be in final testset, because the hacked solution haven't passed it, but have passed current final testset". Of course, we follow the advices.
Unfortunately, there exist solutions that don't fail even on hacks. I think that such kind of solutions exist on every contest platform. It is very hard to predict all possible wrong solutions in every problem.
Thanks for your message! We are trying to make our tests better :)
P.S. I will add your test to the testset in upsolving problemset.
What do you think about marking added tests? And maybe adding its author name. Its fun to see who had developed some good tests. (This idea appear to me after I get WA on such short test, with number >90, and I suspect, that maybe it is added test (or maybe not)).
And messages about not full "test space" appear in forums rarely but from time to time (475C, 483B).