Cheating has been discussed a lot lately. I'm writing a blog post about it with actual data (mainly, I want to know if there is an actual increase). However, in the many discussions that have ensued both here and on Discord servers, I noticed several recurring misconceptions. I wrote a section to debunk them and it grew kind of long, long enough to be a post in itself. So while my scripts are running, we can treat this as a kind of teaser.
Things you should know before you write a comment about cheating
Vocabulary:
- To plagiarize means to use someone else's work while passing it off as yours. It refers to the act itself, not to its detection or punishment. "jrandomcoder plagiarized all of his solutions in round 987" is a meaningful statement. "jrandomcoder cheated in round 987 but he still hasn't gotten plagiarized, I request MikeMirzayanov to look into this" is not.
- Plague is a disease. In a metaphorical sense it can mean any disease and in a doubly-metaphorical sense it could refer to cheating. However, it is not the same word as "plagiarism" and usually can't be used in its place.
Procedural:
- Codeforces has a plagiarism detector.
- The plagiarism detector is run after every round. However, this does not happen live or immediately. Presumably it takes a while and requires considerable human input (for example, if there are close calls).
- The plagiarism detector will be eventually run, likely within a few days after the contest.
- Ratings are updated soon after the round and likely before plagiarism detection happens. This is because users are impatient. Usually, if ratings are not updated 6 hours after the contest, there is already a sea of "is it rated" and requests for MikeMirzayanov to look into it.
- Ratings will eventually be updated to take disqualified people into account. This does not necessarily happen right after the plagiarism check. When it happens, you see a notification of the form "Ratings are temporarily rolled back. They will be returned soon."
- People have been banned for cheating.
Behavioral:
- When you report cheaters, don't tag random testers you found in the contest announcements. Most of them do a virtual contest, leave some feedback and that's it. They are not the organizers or administrators of the contest. They do not have any powers to punish cheaters. (I also doubt that tagging authors or maybe even coordinators is useful, but since I have been neither, I can't really comment on this).
- Definitely don't tag random "famous" or active members of the community who had nothing to do with organizing the contest. We can't do anything.
- Absolutely definitely don't write your cheater reports as replies to unrelated people's unrelated comments. What the hell?
If you are collecting data:
- If a user is caught cheating, they will usually be marked as "out of competition" and the verdict of all their submissions is set to Skipped.
- However, it appears that they are not always marked "out of competition". I don't really know why or what's the difference.
- Having some skipped submissions (and being out of competition) does not necessarily mean someone is a cheater. This also happens if you are e.g. Div 1 in a Div. 2 contest and you make a submission, pass the pretests and then make another submission to the same problem (for example, if you were only a few milliseconds below the time limit). The earlier submission will also be Skipped after system tests.
- Even having all skipped submissions (and being out of competition) does not necessarily mean someone is a cheater. See discussion here. We can ignore this here when gathering statistics because such cases are rare. But this has to be taken into account when discussing individuals.
- If you call https://mirror.codeforces.com/api/contest.status?contestId=2000&handle=jrandomcoder to check if all of someone's submissions in the contest are Skipped, you need to filter out all submissions made in practice mode and similar. They will not be skipped even if the user got caught cheating.
- When someone is caught cheating but before the ratings have been updated, the contest will appear on their profile as if they solved 0 problems, but with a rating change, possibly even a positive one.
- When someone is caught cheating and after the ratings have been updated, the contest appears on their profile as if they participated in it but it wasn't rated for them. You will see it if you choose "Only unrated" or "All" in the drop-down menu in the top-right corner in the Contests tab.
- If you want to compare the frequency of cheating (or many things, in fact), it is not enough to compare one recent contest to one older contest (and similar). Because everything varies unpredictably all the time, you will be comparing random noise to random noise. Some problems may be more suitable for detecting cheaters. The leakers may have had a good or a bad run. For all we know, even the time of day and the day of the week might affect cheating.
Keep in mind though, as I've said before, on a platform with a history as long as Codeforces has, you can't really assume "condition X happens (if and) only if a user got caught cheating". You also can't assume that various unusual situations have always been handled the same way, or even that they have been handled the same way in recent cases, just because you checked a number of recent ones and noticed a pattern. So everything
Wait, so Mike finally listened and started to ban cheaters?
When you go visit some profiles, you might very well be greeted with this message.
This is message is a very recent addition. But I'm not sure about "finally". I don't think it's the case that Mike "finally" started banning cheaters this week or something. Because I don't think the bans are that recent. I can't prove it but I have some evidence.
Workflow:
- Pick contest from a few months to a year or so old.
- Go to the Status tab and filter for Skipped verdict.
- Click on the usernames randomly until you find someone who has been disabled.
- Go to their Submissions tab. Note the time of the last submission and the last Skipped submission.
I think we can assume that for many people, the time of the last submission is close to the time of their banning. With the delays between contests, plagiarism checks and revised rating updates, I think we can also assume that if someone is banned for cheating, it doesn't immediately happen after the contest. I've repeated this procedure for the following users:
- abhishek_1624 (last submission: 16.03.2024, last Skipped submission: 01.03.2024)
- bansala271 (last submission: 22.04.2024, last Skipped submission: 06.04.2024)
- Var57 (last submission = last Skipped submission: 29.04.2024)
- BhuvaneshwariM (last submission: 03.06.2024, last Skipped submission: 25.05.2024)
- binary_search75 (last submission = last Skipped submission: 30.06.2024)
- AlicenBob (last submission: 18.01.2024, last Skipped submission: 30.11.2023)
- manishreddy03 (last submission = last Skipped submission: 03.12.2023)
Not cherry-picking data here. In almost all of these cases, the user got caught cheating and stopped submitting some 2 weeks or so later. It doesn't prove anything but in my opinion it is strong evidence that these users did not get banned yesterday, they got banned months ago after getting caught cheating multiple times. (To be sure, we need to look at more data and compare to people who didn't get banned and so on. But it's a good starting point for a hypothesis.)
I think the reason why some people still believe that cheaters are never banned is that Codeforces, for the longest time, did not really indicate that an user is banned. Let us take rotavirus, perhaps the most famously banned user on Codeforces. For the longest time, you could tell he was banned only by the fact that the "Send message" link was missing (the links in red).
Obviously, not many people are aware of this small detail, so many people perhaps didn't realize how many people are banned (for cheating or trolling). Now it's quite easy to see if you follow my procedure above.
One final note...
As I'm writing my actual blog, it has become clear to me just how much effort does in fact go towards combating cheaters. Things are not perfect and you can always do better, but people are working on this and people do care. And then I see comments and messages where people casually write "bro the admins don't care they just want to inflate participant counts" [the word 'bro' is not my addition]. I've tried pushing back against this once, ages ago, and I got some "bro you think this proves anything" type message.
Maybe Mike doesn't want to say this but I do: this is so entitled, inconsiderate and rude. I've been in a position where I put in a lot of work into something. And then people who don't even look into how much work this is come and tell me that this is trivial or I'm stupid or I don't even care. Or maybe they don't say it about me, but do say it about people who are working on things similar to what I'm working on. Mike doesn't deserve this kind of feedback.