Blog entries - Codeforces

#	User	Rating
1	Benq	3792
2	VivaciousAubergine	3647
3	Kevin114514	3611
4	jiangly	3583
5	strapple	3515
6	tourist	3470
7	Radewoosh	3415
8	Um_nik	3376
9	maroonrk	3361
10	XVIII	3345

#	User	Contrib.
1	Qingyu	162
2	adamant	148
3	Um_nik	146
4	Dominater069	143
5	errorgorn	141
6	cry	138
7	Proof_by_QED	136
8	YuukiS	135
9	chromate00	134
10	soullless	133

Christine-'s blog

The Truth was before them

By Christine-, history, 3 days ago, In English

Full text and comments »

binary search, iq, how to be red, solve more problems

Christine-
3 days ago
9

Cheating statistics

By Christine-, history, 9 months ago, In English

"Lies, damned lies, and statistics."

Big shoutout to macaquedev and all the people working on the cheater database. Their project has already identified 2,100+ verified cheaters. In my experience, they don't assign a cheater mark easily; some of my reports (which, to me, were clear cases of cheating) were rejected.

Here is a small survey on cheating statistics on Codeforces. I computed the rating distribution of caught cheaters and looked at some funny demographics. Namely, we try to compare the frequency of cheating per country.

Data

The list of handles of cheaters was taken from the macaquedev GitHub. My small research is based mostly on this list. I used the Codeforces API to gather the rating and the corresponding country of each handle from the list.

Next, I used the Codeforces API to gather the number of active users (rated a contest in the last 6 months) per country.

Rating distribution of caught cheaters

The following graph is a little tricky. We should take into account that relatively higher-rated cheaters tend to cheat more cleverly. That is probably one of the reasons why there aren’t as many cheaters in the blue range as one might expect.

Rating distribution of cheaters

Here is a more detailed table with the percentages

range	count	in range, %	upper_tail, %
≤1199	715	34.52	100
1200–1399	414	19.99	65.48
1400–1599	434	20.96	45.49
1600–1899	398	19.22	24.53
1900–2099	61	2.95	5.31
2100–2399	39	1.88	2.37
≥2400	10	0.48	0.48

Grey cheaters are the ones caught most often. We can probably also assume that they are the easiest to catch. My subjective feeling is that the situation is grimmer in blue/purple than the graph suggests.

Also, note the blue peak in the 1600–1700 range. Those are probably people who cheated their way to reach blue for some sort of placement and then dropped CP (thank God).

Demographics of caught cheaters

Since many people don't list their country on CF, in this section I only take into account users with listed countries. Sadly, we lose more than half of the data here.

I was not satisfied with claims that we see cheaters from region X more often than from other regions simply because there are a lot of participants from region X. To me, this statement is too loose.

How about applying Bayes’ formula? How about computing the conditional probability $$$P[\text{user is caught cheating} \mid \text{user is from country X}]$$$, which we will denote for brevity as $$$P[\text{cheater} \mid \text{country X}]$$$?

Let’s make a simple computation:

$$$ P[\text{cheater} \mid \text{country X}] = \frac{P[\text{country X} \mid \text{cheater}] \, P[\text{cheater}]}{P[\text{country X}]} $$$

Here is the problem: I don’t know how to estimate $$$P[\text{cheater}]$$$. Of course, there are many more cheaters than the 2100 listed in the database. So, instead, for each country X we compute the ratio $$$\frac{P[\text{cheater} \mid \text{country X}]}{P[\text{cheater} \mid \text{reference country}]}$$$

Then $$$P[\text{cheater}]$$$ cancels out and we have

$$$ \frac{P[\text{cheater} \mid \text{country X}]}{P[\text{cheater} \mid \text{reference country}]} = \frac{P[\text{country X} \mid \text{cheater}]}{P[\text{reference country} \mid \text{cheater}]} \frac{P[\text{reference country}]}{P[\text{country X}]} $$$

As the author of this blog, I choose Russia as the reference country.

Thus, for each country X we need to estimate probabilities $$$P[\text{country X} \mid \text{cheater}]$$$ and $$$P[\text{country X}]$$$.

cheaters — % of all identified cheaters who are from the country. It estimates $$$P[\text{country X} \mid \text{cheater}]$$$ and is computed as $$$\frac{\text{number of cheaters from X}}{\text{number of cheaters with identified country}} \cdot 100 $$$.
users — % of all identified users who are from the country. It estimates $$$P[\text{country X}]$$$ and is computed as $$$\frac{\text{number of users from X}}{\text{number of users with identified country}} \cdot 100$$$
rate — $$$\dfrac{P[\text{cheater} \mid \text{country} X]}{P[\text{cheater} \mid \text{Russia}]}$$$.

Now we can observe the computed values. I dropped countries with less than 5 cheaters caught, so that our inference is more stable.

country	cheaters count	cheaters, %	users, %	rate
India	530	61.92	45.02	4.0921
Vietnam	46	5.37	3.44	4.6476
Bangladesh	34	3.97	11.19	1.0557
Egypt	32	3.74	7.02	1.5837
China	31	3.62	9.69	1.1115
Pakistan	27	3.15	0.58	16.0755
United States	13	1.52	1.48	3.0494
Russia	13	1.52	4.52	1
Iran	10	1.17	1.04	3.3395
Palestinian Territory	8	0.93	0.94	2.9611
Azerbaijan	8	0.93	0.34	8.1122
Japan	8	0.93	0.75	3.6908
South Korea	6	0.7	0.76	2.7551
Brazil	5	0.58	1.13	1.5379
Kazakhstan	5	0.58	1.18	1.4683
Romania	5	0.58	0.56	3.1301

Here is a visualization with the rate sorted in descending order.

Rate

Limitations

I am not biased at all, and I didn’t expect such results. However, keep in mind that the sample size is still not very large (except for India), so the inference can be noisy. Also, note that I don’t build confidence intervals, so the differences between some pairs of countries may not be statistically significant.

For the rating distribution of cheaters, I've already noted that it naturally doesn't give a precise reflection of the reality because it is more complex to catch smart cheaters.

Conclusions

We provided a rough estimate of the factors by which the rate of cheating differs by country and debunked claims like 'they cheat a lot because there are just a lot of them'. Some are caught cheating an order of magnitude more often than others. To make the estimates and the statistics in blue/purple range more precise, please, report more cheaters.

Full text and comments »

bayesian, cheating, statistics

+290

Christine-
9 months ago
57

Suspicious problem setter and tester, harsh__h

By Christine-, history, 15 months ago, In English

Introduction

Hello, Codeforces!

I want to share my suspicions about harsh__h.

I wouldn't write the blog if he was not a relatively high ranked individual who both participated in problem setting and testing. Yesterday, I accidentally noted that the 6th place (standings), in Codeforces Round 1002 (Div. 2), which turned to be harsh__h, had very suspicious submissions. Next, I will try to explain in detail what got my attention in the order in which I uncovered it.

Part 1. Codeforces Round 1002 Div2.

During the contest in problems A(304070640), B(304090034), C(304111128) harsh__h doesn't use spaces between brackets. That is, he writes, say, for(ll i=0;i<n;i++){ not for (ll i = 0; i < n; i++) {. Also, for the newline he uses endl cout << mex << endl;, and these are not interactive problems.

Next, In problems D (304092485), E(304129652), in which I think he used chatgpt, harsh__h uses spaces between brackets and keywords, for example, writing for (ll i = 0; i < n; i++) { instead of for(ll i=0;i<n;i++){. Here for the newline he uses ‘\n', cout << (ans == inf ? -1 : ans) << '\n';

In addition, compare how he read the graph just a week ago in another graph task 303111414. A week ago it was

for(ll i=0;i<n-1;i++){
    ll x,y;cin>>x>>y;
    x--;
    y--;
    adj[x].push_back(y);
    adj[y].push_back(x);
}

and yesterday it was

for (ll i = 0; i < m1; i++) {
     ll a, b;
     cin >> a >> b;
     --a;
     --b;
     g1[a].push_back(b);
     g1[b].push_back(a);
     e1.push_back({a, b});
}

Note also, that in 304129652 he doesn’t use spaces even in cin/cout. cout<<ans<<endl;

To sum up, the submissions look like they were written by different people.

After the contest I noted the unusual codestyle and here was his reaction

So he didn’t comment on the code style, instead he argued that he solved E1, E2, and that he, Codeforces master, doesn’t know if ChatGPT can solve a very straightforward 2059D - Graph and Graph.

Then, there was this guy MayankBhakat, probably his friend, who tried very hard to defend harsh__h, trying to ignore very suspicious codestyle.

Part 2. Educational Codeforces Round 173.

First, at this point harsh__h is a master, so Educational Codeforces Round 173 (Rated for Div. 2) is unrated for him. In this round harsh__h made submissions in very short intervals.

Let’s break it down.

2043A - Coin Transformation. In this submission 298252659 he doesn’t use his template. Also, it is very unusual when a codeforces master writes 50 lines of code to fail to solve Div2A. Then goes this submission 298254253. Now it is AC, and apparently, and someone left some comments).

Then, just 1 minute after he is done with A, he sends D 298254978, where he also does not use the template.

Then, 1 minute after the last D submission, he sends E 298257018 with the same code style as in A, D, which differs significantly from his usual style.

Then, 1 minute after the last E submission, he sends F, etc. So the submission history looks like this

I think that I made my point.

Part 3. Codeforces Round 956 (Div. 2) and ByteRace 2024 (UPD.1)

So, I decided to look into his older submissions. For example, Codeforces Round 956 (Div. 2) and ByteRace 2024.

Consider this submission 269285676 of 1983F - array-value. The codestyle

    while (lo < hi) {
        ll mid = (lo+hi)/2;
        if (slv(mid) >= k) hi = mid;
        else lo = mid+1;
    }
    
    cout << hi << '\n';

While another problem 1983E - I Love Balls in the very same contest 269275920

    if(bb%2==0){
        alice+=(((sum2*(bb))%M)*mod_inv(2,M))%M;
    }else{
        alice+=(((sum2*(bb+1))%M)*mod_inv(2,M))%M;
    }
    alice%=M;
    ll bob = sum*aa+sum2*bb-alice;
    bob%=M;
    bob+=M;
    bob%=M;
    cout<<alice<<" "<<bob<<endl;

Next, look at problem 1983D - Swap Dilemma, submission 269243749, where he doesn't use ll in count_inversion

    ll n;cin>>n;
    vector<int> a(n);
    cin>>a;
    vector<int> b(n);
    cin>>b;
    auto count_inversion=[&](vector<int> arr)->long long{
     
            int n = (int)arr.size();
     
            vector<int> buffer(n);
     
            function<long long(int,int,int,int)> combine=[&](int left_l,int left_r,int right_l,int right_r)->long long{
                int r_pointer=left_r;
                long long cnt=0;
            ...

In addition, check the comments for additional suspicios cases. So harsh__h has a quite long history of making very suspicious submissions. His results in these contests should be investigated, in my opinion.

Plagiarism detection complaint (UPD.2)

Yesterday harsh__h and MayankBhakat were very vocal about me providing any proofs. Sadly, today there are no comments from these guys when I listed a lot of suspicious code.

Also, it is quite interesting that MayankBhakat complains in his blog about him being flagged by the plag system. So now it is less surprising why he defended harsh__h.

Questions left unanswered (UPD.3)

Unfortunately, two days later, despite huge interest from the community, there is still no reaction from harsh__h and he has chosen to ignore everything waiting for the attention to the case to fade. This fact only makes the actions of harsh__h even more suspicious. MikeMirzayanov

Conclusion

Could you please MikeMirzayanov, Vladosiya, KAN check the submissions? harsh__h participated in problem setting and in testing in the past. I believe that this case should be investigated to keep the integrity and fairness of the future rounds. I really hope that harsh__h just has a very peculiar codestyle. Unfortunately, I am not sure at all that this is the case.

Also, I tried my best to not make any accusations but unfortunately the reaction of harsh__h MayankBhakat made me too emotional in my comments. For that I am deeply sorry.

Full text and comments »

problem setter, testing, suspicious

+241

Christine-
15 months ago
31