Comments

My intuition behind this approach is as follows :

  • Suppose we took the sample standing from a subset of trusted participants (let's say an old contest from 2018, before LLMs ever existed), we can calculate the true performance of a new participant by adding them to the same sample. This is similar to how virtual participants are evaluated on Codeforces. We call this the "trusted standing"
  • We'd need hundreds / thousands of verified testers to establish the "trusted standing" for the average contest in 2025, so this is infeasible.
  • How about using AI testers then? They're capable of performing at GM level (chatgpt-o3 ~ 2700 rating), and from what I can tell, they're only getting stronger from now on. We can create weaker variants with rating of [1300 — 2700] and see how they perform in a real contest
  • How about using a finetuned LLM to calculate F(d(solved problems)) then? This seems more cost-effective compared to AI testers.

F(d(solved problems)) is essentially the same as F(ranking) on the "trusted standing" created by AI testers.

Dominater069

"There is no use to such a system." Well, I think it solves many problems, lemme give you some :

  • It revives true progress: From the user's perspective, cheating is destroying the sense of progress for CP enthusiasts, who just want to grind and improve. They're ordinary humans trying to compete against GPUs, so the odds are stacked against them. "Shift up ratings of both actual people and cheaters." Yeah, agreed, but for the average user, it means scaling their performance back to its original strength and appreciating the hard work they put into CP

  • It makes cheating less rewarding: At the end of the day, you're only competing against yourself, and a 2700 cheater will eventually succumb to their impostor syndrome

  • A healthy rating distribution should resemble a bell curve (like back in 2018). In my model, cheating skews the distribution toward the higher end. Codeforces admins could analyze the data to find the rating brackets with the most cheating and use that insight to support other cheating detection methods.

My intuition behind this approach is as follows :

  • Suppose we took the sample standing from a subset of trusted participants (let's say an old contest from 2018, before LLMs ever existed), we can calculate the true performance of a new participant by adding them to the same sample. This is similar to how virtual participants are evaluated on Codeforces. We call this the "trusted standing"
  • We'd need hundreds / thousands of verified testers to establish the "trusted standing" for the average contest in 2025, so this is infeasible.
  • How about using AI testers then? They're capable of performing at GM level (chatgpt-o3 ~ 2700 rating), and from what I can tell, they're only getting stronger from now on. We can create weaker variants with rating of [1300 — 2700] and see how they perform in a real contest
  • How about using a finetuned LLM to calculate F(submissions) then? This seems more cost-effective compared to AI testers.

F(submissions) is essentially the same as F(ranking) on the "trusted standing" created by AI testers.

Mindeveloped

Hi, hope you find this interesting

I've proposed my solution in the past, an alternative rating system for Codeforces, here's the link to it: My proposal

But I'm barely a 1600 in this community, so it's impossible for people to take me seriously. Given your GM status, can you help me spread this idea to a larger audience? (If you find it genuinely useful, of course). I'm just another person who's trying to stop this madness

On DeadMan69ML Based Rating Predictor, 16 months ago
-18

I don't see the point of predicting someone's expected rating, like

Does it motivate someone to grind harder? No

Does it give them insights on how to improve? No

Can they draw concrete conclusions from your output? No

Go touch some grass man, it's over for me

Auto comment: topic has been updated by CP_xam_lon (previous revision, new revision, compare).

Auto comment: topic has been updated by CP_xam_lon (previous revision, new revision, compare).

Auto comment: topic has been updated by CP_xam_lon (previous revision, new revision, compare).

+5

Your analogy doesn't make sense, considering CP requires brains, not brawn. If you want to compare CP to another sport, chess would make a much better counterpart, since it also belongs to the category of intellectual games.

Speaking of chess, there's a well-documented case: the Polgár sisters. Basically, their father (László Polgár) theorized that prodigies can be made through specialized education, so he offered chess training to the Polgár sisters at the age of 4. The result speaks for itself: both went on to become excellent chess players, with Judit Polgár being the first female player to cross 2700 (Super GM status).

Yeah, that's what early training does for athletes at the elite level

On EgorSubmitter, 18 months ago
+36
On EgorSubmitter, 18 months ago
+46

Over-engineering at its finest