Enough Is Enough: A Concrete Plan to Tackle Cheating on Codeforces

#	User	Rating
1	Benq	3792
2	VivaciousAubergine	3647
3	jiangly	3631
4	Kevin114514	3574
5	maroonrk	3521
6	strapple	3515
7	Radewoosh	3461
8	tourist	3428
9	turmax	3378
10	Um_nik	3376

#	User	Contrib.
1	Qingyu	162
2	adamant	148
3	Um_nik	146
4	Dominater069	143
5	errorgorn	141
6	cry	138
7	Proof_by_QED	136
8	YuukiS	135
9	chromate00	134
10	soullless	133

Hello, Codeforces.

I've participated a few rounds and noticed that there are too many cheaters. Now the cheater detection is community-driven and only a few of cheaters are being detected.

Idea

I’m proposing Codeforces Anti‑Cheat (CFAC) – an automated flagging system that works after each contest and automatically detects cheaters using:

— NLP-model based submission (and maybe replacement) checking

— Timings-based detection: if gray solves div.2 e in 3 mins, its suspicious

all of these metrics are combined into suspicion score matrix where score[u][p] is value normalized [-1, 1] where

— -1 — if participant $$$u$$$ 100% not cheating at problem $$$p$$$;

— 1 — if participant $$$u$$$ 100% cheating at problem $$$p$$$;

Need help

I need help in

— collecting labelled data for cheater's code

— final testing of anti-cheat system

My review on my NLP-based model

It works pretty well, but it can detect only well-LLMed submissions like that:

Submission 1

import sys

def solve() -> None:
    it = iter(sys.stdin.read().strip().split())
    t = int(next(it))
    out_lines = []
    for _ in range(t):
        n = int(next(it))
        q = int(next(it))
        a = [int(next(it)) for _ in range(n)]
        b = [int(next(it)) for _ in range(n)]
        # c[i] = max(a[i], b[i])
        c = [max(ai, bi) for ai, bi in zip(a, b)]
        # suffix maxima M[i] = max_{j>=i} c[j]
        M = [0] * n
        M[-1] = c[-1]
        for i in range(n-2, -1, -1):
            M[i] = max(c[i], M[i+1])
        # prefix sums of M
        pref = [0] * (n + 1)
        for i in range(n):
            pref[i+1] = pref[i] + M[i]
        # answer queries
        ans = []
        for __ in range(q):
            l = int(next(it))
            r = int(next(it))
            ans.append(str(pref[r] - pref[l-1]))
        out_lines.append(" ".join(ans))
    sys.stdout.write("\n".join(out_lines))

if __name__ == "__main__":
    solve()

Submission 2

import sys

# Function to calculate the sum of digits of a number
def get_digit_sum(n):
    s = 0
    while n > 0:
        s += n % 10
        n //= 10
    return s

def solve():
    # Read all input from standard input
    input_data = sys.stdin.read().split()
    
    if not input_data:
        return

    iterator = iter(input_data)
    try:
        # First token is the number of test cases
        t = int(next(iterator))
    except StopIteration:
        return

    results = []
    
    for _ in range(t):
        try:
            x = int(next(iterator))
        except StopIteration:
            break
            
        count = 0
        # We are looking for y such that y - d(y) = x.
        # This can be rewritten as y = x + d(y).
        # Let s = d(y). Then y = x + s.
        # We need to check if d(x + s) == s.
        # Since x <= 10^9, y is roughly 10^9.
        # The maximum sum of digits for a number <= 10^9 + 100 is 81 (for 999,999,999).
        # Thus, s will not exceed 90. We iterate s from 1 to 100 to be safe.
        
        for s in range(1, 100):
            y = x + s
            if get_digit_sum(y) == s:
                count += 1
        
        results.append(str(count))
    
    # Print all results separated by newlines
    print('\n'.join(results))

if __name__ == '__main__':
    solve()

Why it isnt working well?:

because my AI-generated samples were very-very simple to detect
because some LLMish things can be too difficult do detect using only CodeBERT-generated embeddings

As solution I will start everything from scratch to make my model detect more AI landmarks which are hard to see through embeddings

Updates

Created cfac repo on github
Updated post text without AI addressing hate comments about AI-slop and pilliamw blog post
Major update: (finally) trained a model for classifying cheaters/not cheaters (not pushed changes to repo yet)

Comments (73)

Write comment?

vn4k

2 months ago, hide # |

Auto comment: topic has been translated by vn4k (original revision, translated revision, compare)

→ Reply

MIRZAPURI

I also support this idea, I really don't understand what would these cheaters even get after getting such ratings but not actually honing their problem solving skills.

Well, I can definitely say that these guys don't know that feeling of the dopamine hit of someone who has their solution get accepted :)

Wbxsi5

Submission time is probably the biggest indicator, but shouldn't spacing and comments raise concerns too? (Like way too overly explanatory comments or spacing that doesn't make sense)

2 months ago, hide # ^ |

I don’t know much about spacing, but to detect redundant comments, we need an NLP model.

Well most LLMs incorporate weird spacing (something like [This])(https://mirror.codeforces.com/contest/2209/submission/367664751)

HelloFromMars

How is this "weird spacing"? That's exactly how I write! Have written code like this since my very first problem... According to you, I am AI :D https://mirror.codeforces.com/contest/4/submission/128290246

SuperLallu

Hey, vs code has feature "format document" (one of the reason of weird spacing... not only ai)

lewandowsk111111

its so funny seeing accounts with little to no activity solve like 3 questions with 0 errors and LLM-esque comments in the first five minutes

accord

+41

The irony is strong with this one.

im using AI to translate it from mt own language. because im very bad in English.

MyBrainGotTLE

ye i can see that

Your updated version of the blog is much better and makes your endeavor much more appreciable, even with "bad" English. Best of luck, hopefully you can come up with a good system!

AksLolCoding

+85

We also need an anti-AI blog system

We have to somehow make them realize that they won't get more intelligent just because they use AI. Otherwise, at some point they will realize it in the hard way.

+10

im using AI to translate it from mн own language. because im very bad in English.

SHANBO

I think that if Codeforces switched to a desktop application where you cannot copy the problem statement during contests, and participants are required to write code within a custom test or link this app with c lion or vs code , it could help reduce cheating. Additionally, for people who try to take screenshots of the problem and send them to AI tools, we could include hidden or specially crafted words in the statement. These words might not be noticeable to humans but could be detected by AI systems, making it easier to catch cheaters.

Nyxs

genius fr

RoadrunnerBrownRational

I want to clarify something about the AI rules — as I understand them, using AI to generate algorithmic logic is prohibited, but I’m unsure how that applies to standard algorithms like Dijkstra: would it still be considered a violation to ask an AI to generate Dijkstra’s algorithm during a contest, even if no part of the problem statement or its details are provided, or is the restriction mainly about using AI in connection with the specific problem?

so you can use prewritten template for the dijkstra algo

kaislash

if it's well known just copy it from something like kactl or your own private code repository.

-8

Well, the problem is I already used GPT to generate Dijkstra because I thought it was not considered illegal.

Now I’m confused how this fits with the third-party code rule. In the third-party code rule it says code is allowed if it was “generated using tools that were written and published before the start of the round”, which GPT seems to satisfy.

Also, Dijkstra is a standard algorithm that existed long before the contest, so it’s not like I used AI to come up with a problem-specific solution. But then there’s also the rule about not using AI to generate algorithmic logic or the key solution, and I’m not sure whether Dijkstra in this context is considered algorithmic logic or the key solution.

That rule has been changed. Whenever you sign up for a contest you see a link to this blog and you must confirm that you read and agree to said rules. Also you've only solved 1 problem and it's not dijkstra.

-10

I’ve read the blog and I understand the intention behind restricting AI use, but I’m still confused about how it applies here.

From the “Prohibited AI Use” section:

I did not input the problem statement, any summary, or any subproblem into AI.
I did not ask AI to explain or derive a solution to the problem.
I did not use AI to debug or fix errors based on verdicts.
I did not use AI for problem understanding or decision-making related to the task.

The only thing I used AI for was generating a standard implementation of Dijkstra’s algorithm, which is a well-known algorithm that existed long before the contest and is widely available online.

Given that using prewritte templates (e.g., personal libraries or public resources) or generating templates using other tools is allowed, I don’t understand why generating the same standard code via AI would be considered a violation — especially since it wasn’t tied to the problem itself.

So is the rule strictly about how the code is obtained (AI vs. prewritten), even if the content is identical and not problem-specific?

I’d appreciate clarification. MikeMirzayanov Vladosiya Um_nik

You may not input the problem statement, its summary, any excerpt, or a sub-problem into an AI-based system to receive ready-made code or natural language descriptions of the solution.

Using dijkstra to find the shortest path is telling the AI to give you the solution to a subproblem, that subproblem being the classic shortest path algorithm

The subproblem here is finding the shortest paths, not “using Dijkstra” itself. I didn’t ask AI how to solve the problem or how to derive the approach. I worked that out on my own — including recognizing that it reduces to a shortest path problem and deciding to use Dijkstra’s algorithm.

The only thing I asked AI for was a generic implementation, with a prompt along the lines of “Dijkstra implementation in C++.”

bruh just don't use ai for anything but translation it's that simple

Then the rules should just clearly say you're not allowed to use AI for anything other than translation—like a lot of other websites already do.

arvindf232

I agree, this is my understanding of the rules all along (especially cross comparing rules with other platforms). I think it is fair to say the rules were written very early when the models aren’t that strong and had not been properly clarified now.

There is always some honor based system rather than defining every single details, but I do agree on this case it should be clarified.

Um_nik

+51

No, seriously, how did anyone get the idea that they should tag me in their AI/cheaters/whatever comments? Or are other random users tagged as much as I am?

Of the three options

copy dijkstra online
ask AI to generate a dijkstra template before start of contest
ask AI to generate a dijkstra template during contest

Yes for sensible people all three are identical (2,3 are just more convenient). Under current rules, 1,2 are allowed and 3 is banned. The reason isn’t about 3 being intrinsically worse than 2, the reason is AIs are too strong and we (collectively (?)) decide that an indiscriminate ban is needed.

It is hard to prove that you would do option 3 exactly identical like option 2, after you have seen the problem. I frequently do option 3 when practicing because I know not to lie to myself but you cannot prove this for a rated contest.

There is also some subtle difference between 2 and 3, if you do 2, I think everyone can be convinced you are aware of this method and implementation ahead of time.

The genuinely bad version is when you ask for a subvariant of Dijkstra that is only applicable in a few situations, including this particular problem. This would be very bad, and I definitely consider that cheating, but I would assume you are not talking about this type. But in a contest, you also cannot prove this is the case.

To provide some context on why I’m raising this question and why I chose to use a throwaway account:

This situation dates back roughly 1.5 years, sometime after September 2024. I’m a fairly high-rated user (2600+), and my rating was actually even higher at that time. My interpretation of the rules back then was that this kind of behavior was allowed. The guidelines explicitly listed certain actions as permitted and others as prohibited, but they didn’t clearly state that anything outside those categories is automatically forbidden, so I ended up interpreting them in that light.

When the idea occurred to me to generate Dijkstra instead of copying it or implementing it from scratch, I briefly checked the third-party code blog. There was a line stating that code generated using tools that were written and published/distributed before the start of the round is allowed. GPT clearly satisfies that condition, so I didn’t dwell on it much further. Additionally, this was during the era of o1-level models, which were perhaps around 1600–1900 strength, and I was using the free model (4o), which was closer to ~500 level, so it didn’t feel like I was gaining any meaningful advantage or doing anything improper.

I ended up doing this in two instances: I generated a simplest-form Dijkstra (essentially a “Dijkstra C++ implementation”-type prompt) and also Hopcroft–Karp (again using a generic “C++ implementation”-type prompt). These were simply components missing from my template at the time, so I generated them and incorporated them — but to be clear, this did happen during contests.

At the time, I didn’t think much of it and later completely forgot about it. After more advanced models like o3 became available, I more or less instinctively stopped using LLMs in any capacity during contests, since at that point it clearly began to feel questionable, especially given that the models were approaching my own level.

Recently, I revisited some of my older code and noticed that those implementations are quite evidently AI-generated. That’s what is making me somewhat uneasy now. If someone were to go through them and raise concerns, I don’t want to lie about the situation by claiming they were prepared before the contest, but at the same time I don’t really feel that I was cheating in spirit, since I would have simply copied the same code from another source if I had realized this might be against the rules (and in hindsight, it was probably just silly of me not to do so).

And yes, I’m still not entirely certain what the correct interpretation of the rules should be (or perhaps I’m just coping about it).

Do_ur_homwork

← Rev. 2 →

Making AI write anti-ai cf blogs...

Edit: see below for vn4k's explanation

according to your username: i havent done my homework yet.

IAmTiredAndSleepy

← Rev. 3 →

I believe the fundamental problem is indistinguishibility of code produced by AI and humans. With very little efforts the AI code can be made similar to normal one. We have seen this: variable renaming, comment removing, if (!cin >> t) deletion etc.

What is more reliable are behaviour patterns, but to this end CF does not have enough reliable metrics. The only reliable metric is submission time, which is definetley not enough.

One obvious candidate for behaviour patterns is how person types the code in. The problem is how to implement it, maybe it's possible to create an official CF edditor-client with disabled copy-paste, which servers the only way to submit the code. On positive side, we get much more metrics to analyze derived from the typing patterns.

I expect people to argue that this way they can't use their custom-build libraries, or custom local judge system, but it might be a good tradeoff, especially for the lower ranks.

because of it i suggest making flagging not autobanning so if NLP-model detects something AI-like he reports it to priveleged users and the review it.

DuyMinh3005

-32

Fun fact:

I still appreciate your blog tho, however, I would rather a blog that was written by human.

egor4kus

Most likely, he was just using a translator

LOL so as I written before i have used AI to put my idea in a more correct form and as you can see my own English speaking is very-very bad.

leftover_19

Your own account doesn't look very promising for this type of post.

+21

If you need help with data collection, model development, system integration and testing and feedback, then what do you plan to do yourself? You thrown out a few vague points that almost all people can and had think of. It sounds like you want to take all the credits for doing none of the work.

If you want to contribute it would be so much better to actually write any of the components here and show your results.

The blog isn’t bad because it is AI written, the blog is bad because you haven’t said or done anything that required experiences or practical experimentations. With AI, words have negative value

My opinion: in fact, none of the components you said are hard to implement, anyone slightly smart (we have plenty on codeforces) and slightly bothered could probably guide LLM to do something useful. The whole issue is at step 1:

collect data:

if all code submissions are made easily downloadable by everyone we run into a totally different copyright/ model data training issue

if it is not, we cannot hope to let everyone help with the cause due to the difficulty of the setup.

Even if you leave the access to privileged people (like >=LGM or >= red) you still have this issue, and also some potential unfairness in competitions of using exclusive access of data.

Transparency with codeforces:

We have no ideas how sophisticated codeforces anti cheat system is. If someone clever worked on it, it will contain everything you said and is a lot better. But codeforces won’t reveal their system with good(?) reasons?

None of the critical issues behind catching cheaters is what you said. The reason this post is AI is that it focuses on sounding good and well thought out where in reality it had negative practical significance.

123gjweq2

-6

how? I don't think anyone actually cares if their code is downloaded.

yangmuguang

If code is more freely downloadable, then AI trainers could download submissions en masse. That in itself isn't a bad thing, but it would put even more stress on CF servers.

34z12000

It is a bad thing for everyone, except ai companies

← Rev. 9 →

0) About “take all credits for doing none of work” — I do not need credits, I need help. If I wanted credits, I would not ask for help.

1) I need help with data collection (with 0/1 labeling) at last. All other parts I can do by myself. If someone help with this, I will implement everything else.

2) About copyright: now all users can watch others submissions for div rounds, what is breaking copyright? Also there is Codeforces problem‑solution datasets on Hugging Face and still no one cares about copyright. If admins will have concerns, I ready to discuss.

3) About anti‑cheat: as I know, Codeforces has not automated anti‑cheat system at this moment. Cheaters are banned only when they take top‑30 places, and current reporting system relies heavily on manual effort from community. If I am wrong, please provide information, I will be grateful.

4) If you think my suggestion is practically useless, then maybe you can suggest something yours which will be “good” in your opinion. I am open to hear concrete ideas.

P.S.: if you really want to help, your PR is appreciated!

Auto comment: topic has been updated by vn4k (previous revision, new revision, compare).

not_not_miky

It's nice to see people like you trying to do something about it, however without the involvement of MikeMirzayanov or an admin, this is just wishful thinking unfortunately... I would also like to see a bit more moderation against cheating, we don't even have a "report" button :/

MEGATRON_HACKER

Yea yea but here's the problem the system test will take 5 damn days to select and see all submissions+ppl can alt submit to get away with there main account so u need to give em a goofy ahh 2tb ram to handle this but this era is lack of ram thanks to AIs' so your outcome is gud but even u give the ram there no resource to handle the bruteforce of 1e9 submisson ;-;

please dont worry, my pipeline will run with the contest, at maximum load it can give 2 seconds maximum on one-core cpu for one submission.

you have skipped submissions (care to explain)?

hello. I have already appealed that. That's because they found my code way too similar with code of chinese account. So i've used offline code editor and I dont know any chinese people. The educational ideas are too popular, so collision of ideas too often. Thank you for your care <3

You got an entire contest skipped

i have received notification about only two problems btw.

using cp editor and then swapping to python?

Because sometimes hard to solve some problem and I need to try many ideas. For faster coding I prefer python.

rustem.memmedli.

While we can build systems to catch who simply "copy-paste", it is nearly impossible to detect cheaters who take an AI algorithm and rewrite it in their own style or make some "humanly" mistakes. However, I believe this system will reduce the overall number of cheaters and push them toward actual learning instead of easy and cheap shortcuts. Fully support this project!

Yes, I want to hardcode the "silly" submissions detections, and less easy-to-find LLM landmarks will be assigned to will of my model

jadoocoder

why your id is showing that you are a cheater too

wtf?

i think its based on skipped submissions that I already appealed, cf moderation is inactive. You also have skipped submissions and marked as "swear_word swear_word cheater" LULZ

MIRAJ12

← Rev. 4 →

There is a pretty good way to stop almost most cheating IG, but I doubt CF would ever follow that cz it would create a lot of load on the system but though it is possible. For example, a structure like LeetCode or CodeChef, with a built in code editor, along with some extra features and a few rule changes.

Rules:

Googling is allowed, but copying and pasting code from online sources is not.

All code must be written directly on the website.

Users can store their own code snippets on CF.

Every keystroke will be recorded in the code editor. If any part of the submitted code doesn’t match the typing history (means pasted), the system will check it against previously stored snippets.

The system should tolerate some mismatches ( x to 0)%. Users can still test their code locally by manually copying it from the browser.

Typing speed could also be tracked if needed, but I don’t think it’s necessary.

So there would be one extra file upload, and after the system checks the file, it can be discarded also for the load issue, the system can perform these checks later, during less busy times.

Now we are going through extreme circumstances. Without extreme measures, I don’t think it’s possible to fully stop cheating, even with your trained ai. Big companies spend billions of dollars every day, yet ai still hallucinates frequently. This means any ai based system will likely false flags.

nice idea!

Xellos

The real question is how much it'd strain the network infrastructure and storage. A single file upload vs real time diff list updating... meanwhile if you offload most of the work to client and only send updates to server from time to time, you're vulnerable to client-side hacking.

DarkDevilVaqif

I am a fan of this algorithm, but wouldn't this work on the very obvious ones? Along with obvious cheaters, there are subtle ones that can solve until some point, but always cheat for 1-2 more

suspicion is computed vy every task so high suspicion_score by one task will mark the submission (only submissions on this task, not user) to report to mods

While this is a great way to work on it, I don't think you can ever fix cheating. People grow steadily but still cheat just for that extra rating, so it seems legit. I also believe a smart cheater will go a long way before getting truly caught, which would only happen if they mess up, and that always happens, the way lies always catch up to us

do you know how elo rating works? Its similar to codeforces, and when there are too many people with extra rating it begins harder to get rating. We cant fully delete cheaters, but we can reduce their number so the Rating system wont break.

I am not interested in chess, but I think I get what you mean. But, isn't that how Codeforces works, too? For example, the more Specialists there are, the more competition there is, and it would be harder for you to get to Expert because you have to beat far more Specialists(or maybe I just misunderstood, sorry in that case)

vn4k's blog

Idea

Need help

My review on my NLP-based model

Updates