A strategy to save CP from the influence of next gen LLMs

#	User	Rating
1	tourist	3985
2	jiangly	3814
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3517
7	Radewoosh	3410
8	hos.lyric	3399
9	ecnerwala	3392
9	Um_nik	3392

#	User	Contrib.
1	cry	169
2	maomao90	162
2	Um_nik	162
4	atcoder_official	161
5	djm03178	158
6	-is-this-fft-	157
7	adamant	155
8	awoo	154
8	Dominater069	154
10	luogu_official	150

As you may (or may not) know, OpenAI recently claimed that their latest O3 model is capable of reaching a staggering rating of 2700, posing a serious threat to the integrity of online CP contests.

My friend and I even had a debate about whether or not this would put an end to CP in the next decade, and we've concluded that, as advanced LLMs become widely available to the public, traditional CP formats like Codeforces contests wouldn't stand the test of time.

Imho, no cheating detection mechanism is sufficient to catch cheaters who actually know what they're doing. For example, they can ask an LLM to produce the textual solution and step-by-step instructions on how to implement it, thereby avoiding any suspicion. The point is, mass cheating with LLMs is inevitable, so the only real solution is to mitigate the effects of cheaters on people who just want to grind CP and have a good time.

One solution to tackle the problem of cheater-induced rating inflation came up in the back of my mind:

Suppose that future LLMs consistently perform at GM level. We'll devise a fine-tuned model to take in the problem statement and evaluate the expected difficulty of that problem (from the human perspective) along with some potential solutions.

Codeforces coordinators and problem-setters can later refine the output to reduce overfitting and biases, ensuring that the expected difficulty matches the real distribution.

Since cheaters can affect the final standing, making it unreliable, we can switch to a new system where individual contest performance is calculated via the expected difficulty of solved problems, instead of their relative standing. In other words, we'll switch from:

$$$\text{P} = F(\text{ranking}) \quad \text{to} \quad \text{P} = F(d(\text{solved problems}))$$$

where $$$P$$$ is the contest performance and $$$d(...)$$$ denotes the expected difficulty of a problemset.

CP_xam_lon's blog