After competing today in Codeforces Round 620 (Div. 2), I received the following message from System:
Attention!
Your solution 71152328 for the problem 1304E significantly coincides with solutions ksun48/71134293, hoke_t/71152328, Venia/71156816, hanga97/71159069. Such a coincidence is a clear rules violation. Note that unintentional leakage is also a violation. For example, do not use ideone.com with the default settings (public access to your code). If you have conclusive evidence that a coincidence has occurred due to the use of a common source published before the competition, write a comment to post about the round with all the details. More information can be found at http://mirror.codeforces.com/blog/entry/8790. Such violation of the rules may be the reason for blocking your account or other penalties. In case of repeated violations, your account may be blocked.
Clearly, these are all due to using the data structure LCA from the well-known KACTL library maintained largely by simonlindholm, Chilli, and many others, and was published online well before the beginning of the round (check the Git commit history). You can see that aryanc403, a tester for Codeforces Round 620 (Div. 2), used the exact same LCA code as well (71170167). MikeMirzayanov, can you please look at this incident and correct it by putting all of us back on the leaderboard and giving rating change? I had over +100 delta in the contest, and it's frankly quite disappointing that the system doesn't have a way to notice that such commonly used templates are not cheating (especially if round tester uses it too). I hope my monetary contribution to Codeforces this year can help prevent such incidents in the future.
Justice for hoke_t
rip looks like there's no justice in this world
Although it "sounds" simple, it isn't possible to find the code online with some search engines. Google, DuckDuckGo, searchcode, symbolhound, and even GitHub.
That's a good idea: just "subtract" the testers' codes from the contestants' codes before checking for plagiarism. It should not be very resource-consuming.
Even better would be to compare with all existing codes on Codeforces — for example, it's possible to split the existing codes into chunks of 20-characters, then make a hash table to map it to the submission ID. That would only be slightly larger than the codes itself.
In the remaining cases — code is posted publicly, but never submitted on Codeforces — would still need to be handled manually. That's only once per library.
Fair enough, although the line you selected just happens to be one I changed. If you search for another line, KACTL LCA will be the first result on Google.
I think both of these ideas are quite good, and I'm surprised they aren't done already (or something similar). And then the case of manually verifying new libraries will not happen too often, whereas now you can see dozens of people in the comments on Codeforces Round 620 (Div. 2) with the same issue for well-known LCA implementations (KACTL and otherwise).
What a waste of time. The chances that tester(s) will use a particular prewritten code in such a way that this "subtract" approach would produce useful info is minimal.
The idea with comparing all codes is much better — but submitted within one contest, since it wouldn't scale well for all submissions ever, especially when this number goes far above the current 70 million.
MikeMirzayanov nice job catching cheaters. I think you should ban all of them, including ksun48. Cheating is very immoral and should not be tolerated. I think you should treat LGMs the same way you treat these greens
Thanks, I'll review all such precedents today.
Thanks!
Stupid swedes