Christine-'s blog

By Christine-, history, 9 months ago, In English

"Lies, damned lies, and statistics."

Big shoutout to macaquedev and all the people working on the cheater database. Their project has already identified 2,100+ verified cheaters. In my experience, they don't assign a cheater mark easily; some of my reports (which, to me, were clear cases of cheating) were rejected.

Here is a small survey on cheating statistics on Codeforces. I computed the rating distribution of caught cheaters and looked at some funny demographics. Namely, we try to compare the frequency of cheating per country.

Data

The list of handles of cheaters was taken from the macaquedev GitHub. My small research is based mostly on this list. I used the Codeforces API to gather the rating and the corresponding country of each handle from the list.

Next, I used the Codeforces API to gather the number of active users (rated a contest in the last 6 months) per country.

Rating distribution of caught cheaters

The following graph is a little tricky. We should take into account that relatively higher-rated cheaters tend to cheat more cleverly. That is probably one of the reasons why there aren’t as many cheaters in the blue range as one might expect.

Rating distribution of cheaters

Here is a more detailed table with the percentages

range count in range, % upper_tail, %
≤1199 715 34.52 100
1200–1399 414 19.99 65.48
1400–1599 434 20.96 45.49
1600–1899 398 19.22 24.53
1900–2099 61 2.95 5.31
2100–2399 39 1.88 2.37
≥2400 10 0.48 0.48

Grey cheaters are the ones caught most often. We can probably also assume that they are the easiest to catch. My subjective feeling is that the situation is grimmer in blue/purple than the graph suggests.

Also, note the blue peak in the 1600–1700 range. Those are probably people who cheated their way to reach blue for some sort of placement and then dropped CP (thank God).

Demographics of caught cheaters

Since many people don't list their country on CF, in this section I only take into account users with listed countries. Sadly, we lose more than half of the data here.

I was not satisfied with claims that we see cheaters from region X more often than from other regions simply because there are a lot of participants from region X. To me, this statement is too loose.

How about applying Bayes’ formula? How about computing the conditional probability $$$P[\text{user is caught cheating} \mid \text{user is from country X}]$$$, which we will denote for brevity as $$$P[\text{cheater} \mid \text{country X}]$$$?

Let’s make a simple computation:

$$$ P[\text{cheater} \mid \text{country X}] = \frac{P[\text{country X} \mid \text{cheater}] \, P[\text{cheater}]}{P[\text{country X}]} $$$

Here is the problem: I don’t know how to estimate $$$P[\text{cheater}]$$$. Of course, there are many more cheaters than the 2100 listed in the database. So, instead, for each country X we compute the ratio $$$\frac{P[\text{cheater} \mid \text{country X}]}{P[\text{cheater} \mid \text{reference country}]}$$$

Then $$$P[\text{cheater}]$$$ cancels out and we have

$$$ \frac{P[\text{cheater} \mid \text{country X}]}{P[\text{cheater} \mid \text{reference country}]} = \frac{P[\text{country X} \mid \text{cheater}]}{P[\text{reference country} \mid \text{cheater}]} \frac{P[\text{reference country}]}{P[\text{country X}]} $$$

As the author of this blog, I choose Russia as the reference country.

Thus, for each country X we need to estimate probabilities $$$P[\text{country X} \mid \text{cheater}]$$$ and $$$P[\text{country X}]$$$.

  • cheaters — % of all identified cheaters who are from the country. It estimates $$$P[\text{country X} \mid \text{cheater}]$$$ and is computed as $$$\frac{\text{number of cheaters from X}}{\text{number of cheaters with identified country}} \cdot 100 $$$.
  • users — % of all identified users who are from the country. It estimates $$$P[\text{country X}]$$$ and is computed as $$$\frac{\text{number of users from X}}{\text{number of users with identified country}} \cdot 100$$$
  • rate — $$$\dfrac{P[\text{cheater} \mid \text{country} X]}{P[\text{cheater} \mid \text{Russia}]}$$$.

Now we can observe the computed values. I dropped countries with less than 5 cheaters caught, so that our inference is more stable.

country cheaters count cheaters, % users, % rate
India 530 61.92 45.02 4.0921
Vietnam 46 5.37 3.44 4.6476
Bangladesh 34 3.97 11.19 1.0557
Egypt 32 3.74 7.02 1.5837
China 31 3.62 9.69 1.1115
Pakistan 27 3.15 0.58 16.0755
United States 13 1.52 1.48 3.0494
Russia 13 1.52 4.52 1
Iran 10 1.17 1.04 3.3395
Palestinian Territory 8 0.93 0.94 2.9611
Azerbaijan 8 0.93 0.34 8.1122
Japan 8 0.93 0.75 3.6908
South Korea 6 0.7 0.76 2.7551
Brazil 5 0.58 1.13 1.5379
Kazakhstan 5 0.58 1.18 1.4683
Romania 5 0.58 0.56 3.1301

Here is a visualization with the rate sorted in descending order.

Rate

Limitations

I am not biased at all, and I didn’t expect such results. However, keep in mind that the sample size is still not very large (except for India), so the inference can be noisy. Also, note that I don’t build confidence intervals, so the differences between some pairs of countries may not be statistically significant.

For the rating distribution of cheaters, I've already noted that it naturally doesn't give a precise reflection of the reality because it is more complex to catch smart cheaters.

Conclusions

We provided a rough estimate of the factors by which the rate of cheating differs by country and debunked claims like 'they cheat a lot because there are just a lot of them'. Some are caught cheating an order of magnitude more often than others. To make the estimates and the statistics in blue/purple range more precise, please, report more cheaters.

  • Vote: I like it
  • +290
  • Vote: I do not like it

»
9 months ago, hide # |
 
Vote: I like it +3 Vote: I do not like it

Auto comment: topic has been updated by Christine- (previous revision, new revision, compare).

»
9 months ago, hide # |
 
Vote: I like it -156 Vote: I do not like it

Dear Mindeveloped, Would you like to take a look at this post and please, for god's sake, stop India hate and start Pakistan hate?

  • »
    »
    9 months ago, hide # ^ |
     
    Vote: I like it +47 Vote: I do not like it

    It's the fact that India has 500+ cheaters, even if u hate Pakistan or ur Indian, u can't change this lol

    • »
      »
      »
      9 months ago, hide # ^ |
       
      Vote: I like it +9 Vote: I do not like it

      no of indians users active on codeforces are way more than any country

    • »
      »
      »
      9 months ago, hide # ^ |
      Rev. 2  
      Vote: I like it +7 Vote: I do not like it

      And we also can't change the fact that India has a huge population on this platform. So will you just ignore so high cheating rate from Pakistan just because they are less? In my opinion, We should look at the ratios, not the numbers. That was the point of this blog, we all knew India has a lot of cheaters. Did you even read the blog properly?

      • »
        »
        »
        »
        9 months ago, hide # ^ |
         
        Vote: I like it 0 Vote: I do not like it

        truee

      • »
        »
        »
        »
        8 months ago, hide # ^ |
         
        Vote: I like it 0 Vote: I do not like it

        For matters of improving the experience of codeforces rounds, we definitely should NOT be looking at ratios, but absolute numbers. What is the point of knowing that a country is 90% cheaters if there are only 10 people participating there? I would rather focus on the 10%-cheater country that has thousands of participants.

        For matters of picking which country to hate, do I even need to say that you shouldn't be picking a country to hate?

  • »
    »
    9 months ago, hide # ^ |
    Rev. 2  
    Vote: I like it +23 Vote: I do not like it

    India accounts 61% of the cheaters and therefore has the biggest impact over academic integrity values (and ratings ofc). I don't understand what does the value of $$$P[\text{user is caught cheating}∣\text{user is from country X}]$$$ implies.

    • »
      »
      »
      9 months ago, hide # ^ |
      Rev. 2  
      Vote: I like it +41 Vote: I do not like it

      P[ Is a cheater | from X country ] means if you pick someone from X country randomly, what is the chance that the person is a cheater.

      It's more of quality than quantity,
      everyone agree that India has the most impact on cheating,
      but not everyone agree that India has the most cheating rate (per person).
      This blog serves as a measured answer.

    • »
      »
      »
      9 months ago, hide # ^ |
      Rev. 3  
      Vote: I like it 0 Vote: I do not like it

      I was talking about rate here. Pakistan literally has 4 times the cheating rate of India. India having a huge population on this platform doesn't imply most Indian people cheat. You should always look at the ratio, not just numbers. I was waiting for this kind of thing for a long time, and thanks a lot to Christine- for that.

    • »
      »
      »
      9 months ago, hide # ^ |
       
      Vote: I like it -42 Vote: I do not like it

      Sir, please don't be racist. I know so many people from IIT Kharagpur who give contests honestly. They always say they never cheat. It’s really unfair to generalize like this, sir. Sir, your math might not be correct.

»
9 months ago, hide # |
 
Vote: I like it -150 Vote: I do not like it

Yo, mindeveloped, you’re probably jizzin’ your pants over this data, huh? Indians owning the cheater list got you so horny you’re basically fuckin’ the screen. Bet you’re drooling, dick in hand, thinking you’ve won the racist lottery. Calm your tits, bro—numbers ain’t a race, but your bigot vibes are loud as fuck. Get a grip before your hate-boner breaks the internet.

»
9 months ago, hide # |
 
Vote: I like it +4 Vote: I do not like it

Auto comment: topic has been updated by Christine- (previous revision, new revision, compare).

»
9 months ago, hide # |
Rev. 2  
Vote: I like it +42 Vote: I do not like it

Hi Christine-, thanks so much for the shoutout and for analysing my data. I'll put in my thoughts about some of this.

  1. Sorry that your reports got rejected. If this happens again I'm very happy for you to DM me and I will review once again / give you more information as to why I rejected.

  2. I don't think "smart cheaters" is the reason why users blue and above aren't caught as much. I think the reasons are that they get caught before they get to blue (after all it takes a few contests to get to blue), and that there are generally fewer blues on the platform (just, due to how the rating system is built).

Anyway, thanks a lot to the community for submitting reports. Keep going!!

  • »
    »
    9 months ago, hide # ^ |
     
    Vote: I like it 0 Vote: I do not like it

    For those who seldom sell their honesty already at 1500+, 1800+ . They only need a single contest/couple of problems in a contest to become Blue and above.

    • »
      »
      »
      9 months ago, hide # ^ |
       
      Vote: I like it 0 Vote: I do not like it

      What I'm saying is, new users who come to the platform and start cheating, need about 5 or 6 contests to become blue. By that point, we've most often caught them

»
9 months ago, hide # |
 
Vote: I like it +2 Vote: I do not like it

Bro is farming up votes lol

»
9 months ago, hide # |
 
Vote: I like it 0 Vote: I do not like it

Some people directly solve Div2 D after solving A in their 2nd contest

»
9 months ago, hide # |
 
Vote: I like it 0 Vote: I do not like it

can u add σ for every country rate? like if you test 100 country, there's ~1 country have a +3σ rate just because unluky.

»
9 months ago, hide # |
Rev. 4  
Vote: I like it 0 Vote: I do not like it

))

»
9 months ago, hide # |
 
Vote: I like it -25 Vote: I do not like it

i hope your data is as correct as energetic you are about this, if this is wrong, you're throwing a ton of people under the bus for no reason/weird coding habits.

in any case, i'd argue that cheaters in codeforces rounds are just sad people. there's no reason to indulge in these cases any further considering that there's no money to be gained from doing these comps. if anyone feels their rating is invalidated by the success of cheaters using chatgpt or whatever, then we have an innately psychological problem at hand. learning is a process of you vs you, not you vs 20k codeforces round x participants.

»
9 months ago, hide # |
 
Vote: I like it -8 Vote: I do not like it

Bruh, i don't know about the cheaters from my country.

»
9 months ago, hide # |
 
Vote: I like it -8 Vote: I do not like it

Well well well

»
9 months ago, hide # |
Rev. 2  
Vote: I like it -37 Vote: I do not like it

Well I may have some problem expressing my thoughts.

  • »
    »
    9 months ago, hide # ^ |
     
    Vote: I like it +14 Vote: I do not like it

    But, I don't think it is connected to the country!

    It does for sure and the ratios of the posterior probabilities point to exactly this.

    Their nation didn't ask them to cheat.

    Nobody says that.

    Besides, not everyone cheats.

    True. Also, nobody says that everyone cheats.

    How would a honest participant think if he is blamed just because there is a cheater from his country?

    And who do you blame? Do you blame hordes of cheaters from country X? No. You blame someone who lists facts and who has nothing to do with cheating in country X.

    So in the post, It is definitely not acceptable to mention the nationality,

    Why? It is perfectly fine to me. People have a right to know it, especially when cheating is so disproportional.

    • »
      »
      »
      9 months ago, hide # ^ |
       
      Vote: I like it -12 Vote: I do not like it

      Horrible post. There is no direct connection between nationality and cheating besides the statistics, you could make the same argument for thousands of other random criteria that have nothing to do with one another, but just so happen to align. And the poster isn't "blaming" you for the amount of cheaters from country X, he's simply pointing out that it is improper to incriminate entire countries. Find me one competitor who can control who cheats/doesn't cheat from his country.

      There's a big difference between being passionate about honesty in competitive programming and pointless discrimination tied to honesty in competitive programming.

      • »
        »
        »
        »
        9 months ago, hide # ^ |
         
        Vote: I like it 0 Vote: I do not like it

        There is no direct connection between nationality and cheating besides the statistics.

        Have you read the blog? About what "direct" connection are you talking about?

        you could make the same argument for thousands of other random criteria that have nothing to do with one another, but just so happen to align.

        Statistics can’t provide causality. What’s your point?

        he's simply pointing out that it is improper to incriminate entire countries.

        Since when is citing statistics the same as incrimination?

        Find me one competitor who can control who cheats/doesn't cheat from his country.

        At this point, I understand that I wasted my time replying you. But I've already written the reply so be it.

        • »
          »
          »
          »
          »
          9 months ago, hide # ^ |
           
          Vote: I like it -6 Vote: I do not like it

          So at one point, you say "It does for sure" about cheaters being connected to the countries they come from and then you say "What "direct" connection are you talking about?", so which one is it? Also you yourself said that statistics can't provide causality, so how come "ratios of posterior probabilities" do? The facts you are listing hardly remain so after you imply that they serve in this certain manner, and insulting me isn't gonna change that.

    • »
      »
      »
      9 months ago, hide # ^ |
       
      Vote: I like it -14 Vote: I do not like it

      Yes, the data does indicate that some countries (or one in particular) have more number of cheaters or high cheating rate than others. But I still don't see why is that supposed to mean cheating is connected to country. I mean it is a personal choice that one decides to cheat. How does country matter here?

»
9 months ago, hide # |
 
Vote: I like it -10 Vote: I do not like it

I do think cheating has truly nothing to do with the nationality. However a growing trend in india is going upwards in the last months which is to cheat in codeforces. This could be a upper motivation for me and others to improve more and take with rage. Otherwise I would suggest freezing the rankings and elo of new indian accounts till the trend disappears. Otherwise it will only be more popular that way.

»
9 months ago, hide # |
 
Vote: I like it +7 Vote: I do not like it

Also, if anyone is curious, here is a graph that shows national $$$IQ$$$ vs rate of cheating for countries with $$$\ge 5$$$ cheaters.

graph

The correlation of $$$|r| \approx .22$$$ is close to the correlation between civility and $$$IQ$$$.

  • »
    »
    9 months ago, hide # ^ |
     
    Vote: I like it +103 Vote: I do not like it

    Through all of these years, the only unshakeable constant in my life has been one man schizoposting about IQ under every other cf blog I lay my eyes upon. It has truly been a pleasure.

  • »
    »
    9 months ago, hide # ^ |
     
    Vote: I like it +40 Vote: I do not like it

    The red line is as pointless as my life

  • »
    »
    9 months ago, hide # ^ |
     
    Vote: I like it +5 Vote: I do not like it

    I love how bro always include IQ and culture / cilivity into controversial topics, that is why I follow you man! Also you can see that Pakistan and Azerbaijan are outliers and removing them makes the correlation basically zero (I didn't actually do the math, just eyeball statistics, seems pretty uncorelated to me)

    • »
      »
      »
      9 months ago, hide # ^ |
       
      Vote: I like it -7 Vote: I do not like it

      I'm not gonna remake the graph without those two, so I will agree that without Pakistan and Azerbaijan, the correlation is basically $$$0$$$. But I don't think that it is fair to exclude these two countries just because they have oddly high rates of cheating. In fact, these might be some of the most important countries to include because of their high rates of cheating. If we exclude them, we might be artificially restricting the range on the rate of cheating instead of just removing outliers.

      Either way, I would not say that it's a coincidence that both of these two countries have $$$ \lt 90$$$ national $$$IQ$$$ s. I could just not imagine a country like China or the US cheating this much. But there definitely is a lack of data here. Maybe the topic should be revisited in a few months when macaquedev's list has grown to like $$$20k$$$ or something.

  • »
    »
    9 months ago, hide # ^ |
     
    Vote: I like it 0 Vote: I do not like it

    India on the bottom for everything again lul

»
9 months ago, hide # |
 
Vote: I like it 0 Vote: I do not like it

I've been saying this for years. Just ban all Indians. Move them to CodeChef or something

  • »
    »
    9 months ago, hide # ^ |
     
    Vote: I like it -11 Vote: I do not like it

    Move your superiority complex somewhere else where the questions are easier, maybe then you'll be able to push your insecurities under the rug.

»
9 months ago, hide # |
 
Vote: I like it +27 Vote: I do not like it

It's funny to see that many Indians here are trying to reduce the racism on themselves (which is good because racism is bad) but the way they are doing it is by redirecting the racism towards Pakistanis, GUYS THAT'S NOT HOW IT WORKS, RACISM IS ALWAYS BAD EVEN AGAINST PAKISTANIS!!

  • »
    »
    9 months ago, hide # ^ |
     
    Vote: I like it +3 Vote: I do not like it

    As an Indian, Please don't judge just because one person did that. I'm not trying to redirect it towards Pakistan at all. I just want India hate to end on codeforces. But to be honest, it is a fact that India produces the most cheaters on this platform, so the hate is kind of valid. But I hate to see that I'm not taken seriously anywhere just because I'm from India, even though I'm nothing like other people expect an average Indian to be.

    • »
      »
      »
      9 months ago, hide # ^ |
       
      Vote: I like it 0 Vote: I do not like it

      I said in my comment "many Indians" not "all Indians", the entire purpose of my comment is against generalization so I'm sorry if you got the wrong idea.

»
9 months ago, hide # |
 
Vote: I like it 0 Vote: I do not like it

My fav cheater is Master :D

»
8 months ago, hide # |
 
Vote: I like it +3 Vote: I do not like it

Love the action on this blog, keep going

»
8 months ago, hide # |
Rev. 4  
Vote: I like it +3 Vote: I do not like it

The rate statistics have changed since first publishing of this blog. Here is a new table of countries and rates. (Table is from 2025-06-09 17:55 UTC)

country cheaters#,% users% rate
Pakistan 27: 2.96% 0.53% 15.122293
Azerbaijan 12: 1.32% 0.32% 11.049215
Tajikistan 5: 0.55% 0.28% 5.329293
Vietnam 50: 5.48% 3.30% 4.507090
India 558: 61.18% 42.70% 3.891411
Japan 9: 0.99% 0.72% 3.706792
Iran 11: 1.21% 0.97% 3.393918
United States 14: 1.54% 1.36% 3.058551
Romania 5: 0.55% 0.53% 2.827438
Palestinian Territory 8: 0.88% 0.90% 2.654591
South Korea 6: 0.66% 0.73% 2.436952
Egypt 32: 3.51% 6.34% 1.503535
Brazil 5: 0.55% 1.08% 1.376109
Kazakhstan 5: 0.55% 1.21% 1.235019
China 36: 3.95% 9.23% 1.161049
Bangladesh 38: 4.17% 10.49% 1.078937
Russia 15: 1.64% 4.47% 1.000000

Here is the new rating graph:

»
8 months ago, hide # |
 
Vote: I like it 0 Vote: I do not like it

disheartening :(

»
8 months ago, hide # |
 
Vote: I like it 0 Vote: I do not like it

Bangladesh's count should be more than 1000

»
8 months ago, hide # |
 
Vote: I like it 0 Vote: I do not like it

I think this is the biased calculation.

»
8 months ago, hide # |
 
Vote: I like it +10 Vote: I do not like it

Cheaters own problem G in last contest, which a lot of reputed honest coders didn't solve. Very shameful thing. How can I give contests if there are so many cheaters?