On AI ruining "solving math problems with computer" [a year later]

I guess you were right in the part "a year from now LLMs will solve problems on your level". You were also saying something along the lines of "you'll change your mind when it starts affecting you". In this part, you were wrong.

Cheaters always existed, and they always will. LLMs make it easier to cheat, true. There is no way of completely preventing LLM cheating (or any other kind of cheating) in online competitions.

I just coordinated the round everyone is buzzing about, so I can with confidence say that in the CF rules for coordinators and authors, there isn't anything about making problems LLM-proof. Not for easy problems, not for hard problems, nothing. And I doubt any such rules will appear. LLMs are forbidden by the rules, that's that.

So we, as honest participants (yes, I will assume that you are an honest participant, otherwise this blog is not applicable to you and I don't care about you), will have to accept that some problems in the future rounds will be LLM-able, and there will be people cheating using an LLM and getting Accepted verdicts on some set of problems that will depend not on their skill level or the difficulty of the problems, but on the problems' LLM-ability (and cheater's ability to LLM, ha).

In my opinion, the worst effect of cheaters on Global Round 29 was the fact that problem G with a 4500 score got more solves than problem F with a 3000 score, and that made a lot of participants skip F (and sometimes even E) in favor of "free points" in G. I was always saying that you shouldn't open standings during the contest, if you can get a solve count for every problem. Heck, in a CF round, you don't really need even the solve counts: the problems are sorted and there is a known scoring distribution. Well, maybe the times are changing. Maybe you need to be aware that some participants are not playing fair. And maybe a high solve count doesn't say much about the difficulty of the problem for a human, but rather the difficulty of the problem for an LLM.

Of course, authors and coordinators can mess up the difficulty order. It happened before, it will happen again. Plus, difficulty is subjective and multifaceted. So, keeping an eye on the solve counts, looking for discrepancies, is still a good tactic. But maybe, just maybe, after seeing how everyone solves G in 5 minutes, but then you open it and it is a scary number theory in which you need 10 minutes to just understand the statement, and for the next 10 minutes you can't make any headway, maybe take a peek at the standings and check who are those "everyone" who solves G in 5 minutes. You know, hypothetically. Any similarity to any recent contest is a coincidence.

Trust yourself more. If the supposedly "easy 5-minute adventure that grays are solving" seems difficult, well, maybe it is difficult. Or maybe it is difficult for you. It doesn't even have to be cheating or LLMs. Maybe you don't know some technique that makes the problem simple. Or maybe those who got AC just have seen this problem before and copied their old solution.

All of this was happening 10 years ago (and, I'm assuming, 20 years ago, but I'm not that old), before LLMs. You are not the first generation to encounter cheaters; you are not the first generation to encounter the "standings effect". I'm not saying LLMs are not affecting your contests; they certainly do. But they are not going anywhere, and you will have to adapt.

Comments (45)

Write comment?

123gjweq2

7 months ago, hide # |

← Rev. 2 →

-291

Wow, I am first. Btw, most of the cheaters who solved $$$G$$$ got removed so idk what to tell you.

→ Reply

Um_nik

7 months ago, hide # ^ |

+258

Great job on completely missing the point of the blog. Must be something about your IQ.

Mindeveloped

← Rev. 3 →

+56

Yes you are right but when can we have our non-AI proof but interesting problems back again? As you can see in discussions some coordinators still reject non-AI proof problems

tfg

+10

The issue isn't as simple as just AI being able to solve problems, it's that AI allows quicker searching. When I read the latest Latin American ICPC subregional hardest problem, I thought "this setting is so classic that surely there was research about it and LLMs should be able to recall that". When the setters published the editorial, they mentioned papers about the problem so my intuition seems to be a hit (I didn't try to solve it with LLMs).

Now that isn't an issue for ICPC contests, but for codeforces rounds I've seen quite a few problems being denied because of a similar/almost the same/same problem from the past and that's more like what AI is good at. Would you be willing to let those be used as well? (I would)

Why not? I don't see any issue.

Though, there is a difference between "somebody with bad sportsmanship would search and get it" and "the problem has just appeared recently/is too classic and everyone instantly recognize it". The latter one would ruin the fun.

A recent problem was denied because it was an exact match with IOI 2002 or something like that. I don't remember if it was the author or the coordinator that removed the problem.

+21

Well, googling is still allowed by the rules. And there is consensus that knowingly reusing old problems is not allowed. I was trying to raise that question with KAN, but he kinda refused to even discuss it.

Googling is still allowed, but I see not much difference between googling stuff and using AI to find out stuff that's well known (since you can just describe the problem to it and it'll most likely give you the actual name of the problem). Can I use AI during contests to find out about for example vertex geography (without knowing the name of the problem) if I don't remember its details? I had the impression that I read something against that on the rules but imo it's really not much different than google other than what kind of search string you can use.

About reusing problems, what's "knowingly reuse a problem"? Using the case I mentioned as an example it seems to be "using a problem that's been found out to have appeared before". I'd like a definition of "knowingly copy a problem and recreating the problem is fine even if it's found out to have appeared before more than a decade ago (or some other fitting timeframe)" which would be the case here.

If you are using a problem that was found to already exist during testing, I'd say you are knowingly reusing an old problem.

Error_Yuan

I did it.

+15

This kinda implies that you don't find the problems in recent contests interesting? I disagree.

But feel free to show us how it should be done!

The thing is, I have seen some really interesting Div.2 C ideas being rejected because non-AI proof, the problems that are presented are also interesting to me

By the way I'm proposing a round (my first one) right now and I hope everyone would enjoy it in the future :D

Dominater069

+23

Your information sources are lying to you. This was maybe 1 — 2 years ago.

+31

I can't answer for other coordinators, but I can say for certain that I have never rejected a problem for being non-AI-proof. I wouldn't know which problems are AI-proof, to be honest.

DNR

+54

On the bright side, we almost certainly won’t have to care about any of this once AI gets much better than every human at CP, as we would all be dead soon afterwards.

piaoyun

+17

Based.

M_Abu_Bakr_Shahid

-19

how!!

kr25161

ig bro just watched "ragebait tutorial" on youtube

Ioniser

Let's die like legends!

AllenAlien0307

That's it. I'm one of those who were misled by the standings and skipped E and F in Global round 29, which turned out to be a very stupid decision.

SomethingNew

+35

How did you guys remove all those "unfair" solutions from the standings? It looks like you just banned all the accepted submissions from low-rated participants, which seems kind of unfair

nifeshe

+153

Seems fair to me

MTSS

genuinely, I don't think any gray participant could've solved such a difficult problem anyways. It's just facts

+64

I wasn't participating in that process or even closely monitoring it, which was a mistake on my part. I think there were some very questionable methods used, and there might have been several false positives. Please do not consider this to be an official comment from CF.

wddd

+13

Agree with you that the contest problems don't need to be LLM-proof, which anyway probably won't be practical soon given the fast evolution of AI.

However, I do think that Codeforces as a platform should implement more mechanisms against cheaters. Mobile verification will be very effective. Even simple solutions like adding invisible texts into problem statement will be helpful. Same as other sports, we definitely can't fully eliminate all cheaters, but that doesn't mean we should just let them cheat with almost no consequence. We need to fight against them to maintain a healthy competitive environment.

Wandoka

+161

True gigachad mentality would be to participate in contests and not look at the scoreboard even after the contest is over

CutSandstone

Unfortunately the low scoreboard placement is the motivator for most people to do cf.

harsh__h

Thanks a lot for this positive blog, much needed for me. It was really hard losing so much rating.

a_b_68687

imo we should have a contest(maybe unrated) where use of AI is fully allowed just to see the extent to which AI can 'ruin' contests.

keetarp_panda

fair enough . can't do much to control. It's better to just focus on improving self.

MyBrainGotTLE

-14

why won't codeforces ever get a good anticheat system like chess.com or smth like that

NemanjaSo2005

What would be a good anti cheat system? I don't need words, I am interested in metrics, what would you want to minimize/ maximize? And how exactly would you distinguish between a cheater and an honest person?

It is not as easy as "just get better anti cheat", someone actually has to do a lot of work to make it. And it will undoubtedly still have problems, like, you need to distinguish between people getting lucky and solving a hard problem and an LLM telling them the solution.

-24

Just watch this

The source you provided answers none of my questions. The thing is, catching cheaters is hard to automate. Though I would be more than happy if you can prove me wrong.

-is-this-fft-

+22

What would you do, specifically? You can't really adapt anything from the chess world because our solutions are single text files, not long sequences of moves.

Let's take a typical scenario that can't be detected with current methods. You're solving a problem with difficulty rating similar to your own, which is likely the hardest problem in a contest that you could reasonably solve. You ask ChatGPT for some help (let's say you do it on your phone to sidestep any suggestions about CF detecting tabbing out). You get the right idea, but implement it yourself. As a result you get a considerable advantage.

How would you detect this?

As you wrote the code yourself, it is consistent with your other submissions.
You had a 50% chance to solve the problem in contest, so solving it is consistent with your skill level.

Even if you do this in every contest, there is not a lot of statistical evidence to accuse you of anything.

Chess is a bit different because every game consists of a lot of moves, so statistical evidence of cheating accumulates much quicker.

A_Le_K

You can't really adapt anything from the chess world because our solutions are single text files, not long sequences of moves. Chess is a bit different because every game consists of a lot of moves, so statistical evidence of cheating accumulates much quicker.

I'll be a bit nitpicky: Each chess move can be encoded using at most 12==6+6 bits==1.5 bytes (64==2**6: position from which the piece moves, 64==2**6 positions to which the piece moves; in practice: even fewer moves).

Chess games are rarely longer than 100 moves, so rarely more than 150 bytes of user input per game (though AFAIK, chess anticheat systems also use other "side-channel" info such as time-per-move variability depending on position difficulty and other user behavioral patterns).

Nonetheless, too bad that competitive programming isn't cheater-safe anymore =( (chess, Counter-strike, and now you too, cp)

baudii

4 months ago, hide # ^ |

-10

I guess the only way to actually get rid of the cheating in CP is to do an offline events, similar to how the problem is addressed in other cybersports and chess. It is still possible to cheat on an offline event, but it's much harder. You'll have to be really creative for that.

Don't necropost.

We already have offline events, although more are always welcome.

The simple solution for Codeforces specifically is one of attitude — don't treat rating as the end of the world.

Pie_TealCompressorDragon

I think this way many cheaters in higher ranks who probably made their LLM-code look somewhat human were caught. In the end; most of them were removed from the leaderboard, so the only negative aspect is people being misled into thinking G was easy.

oversolver

finally some water

employed

On the bright side you would be shocked at the amount of people in my region who stopped CP because AI is "taking their jobs", it makes me alot less worried about regional icpcs and potentially winning one, also surprisingly with how easy it is to have access to "gpt 5 thinking" there isnt that many cheaters.

edit : despite the first point killing the region which is kinda sad there is a bright side to it

DankCommenter

-53

What a great post about nothing... Can anyone tell me who the fridge this guy is and why is CF full of word vomits from irrelevant people these days?

Kapt

+67

The problems aren't always sorted by difficulty perfectly: the problemsetters might be wrong or you can specialize at some type of problem and get an advantage on a later problem that is easier for you specifically.

I used to look at the contest's main page to see solve counts and make some decisions about what to solve. I guess now I'll have to refer to the friends' standings instead, which is a little less convenient, but not the end of the world

Golovanov399

+53

Btw I should've solved problem G quickly if I were not so not in shape. You know as they say, 90% of participants drop free points problem 5 minutes before they would solve it.

vidanovic

-8

The proposition of a Spartan-like codeforces solving method which relies purely on your own intuitively formed "grade" or difficulty of a problem is certainly an interesting point Um_nik brings up, if I have understood it correctly. And personally, I agree with it. It is an incredible confidence booster to be in the dark. If the problemset merely contained problems with no label of which order they appeared in in a contest or a rating, it would force programmers to tango with more difficult problems than they can solve without the ever-looming thought of "it's just too hard for me". It even works both ways. If you assume everyone has solved a problem, then you simply deduce that the solution must be something trivial, and that you mustn't dwell on possibilities that are unnecessary. Then, if you have been defeated by the problem, it's difficulty to you will change your assumption into an assurance that nobody else has solved it (or a small amount of people has, way above your rating). Therefore, perhaps influenced by the format of Serbian National Competitions, where others' results are unavailable, I must second this point and say that it is marvelous. Once you know what you're doing, cast yourself into the Arena. Don't look at editorials. Don't look at how many people have solved D, E, F, or whatever problem you're currently looking at. With perseverance and creativity, thorough knowledge of theory will prove enough for this task, and if it doesn't, it will be a reminder that you haven't learnt it well enough!

#	User	Rating
1	Benq	3792
2	VivaciousAubergine	3647
3	Kevin114514	3603
4	jiangly	3583
5	turmax	3559
6	tourist	3541
7	strapple	3515
8	ksun48	3461
9	dXqwq	3436
10	Otomachi_Una	3413

#	User	Contrib.
1	Qingyu	157
2	adamant	153
3	Um_nik	147
4	Proof_by_QED	146
5	Dominater069	145
6	errorgorn	142
7	cry	139
8	YuukiS	135
9	TheScrasse	134
10	chromate00	133

Um_nik's blog