Improving your rating: A statistical perspective

#	User	Rating
1	tourist	3985
2	jiangly	3814
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3517
7	Radewoosh	3410
8	hos.lyric	3399
9	ecnerwala	3392
9	Um_nik	3392

#	User	Contrib.
1	cry	169
2	maomao90	162
2	Um_nik	162
4	atcoder_official	161
5	djm03178	158
6	-is-this-fft-	157
7	adamant	155
8	awoo	154
8	Dominater069	154
10	luogu_official	150

Since this December, I've been exploring codeforces data to answer the question of how best to improve your rating. Today, I'm presenting to you all the final project!

The first problem in trying to do data-driven analysis is trying to get data. Codeforces does have an API, but it's quite difficult to get data from it at a large scale. So first, I cleaned everything up and created two big datasets, and I've published them to kaggle. So, if you think my analysis is garbage, you can download the dataset yourself and try it yourself!

Dataset: The submissions and contests results for 60k recently active users

Dataset: The final standings for every codeforces contest

The process of getting data, if you care.

After getting the data, I began doing analysis and creating some charts. I looked at a lot of features, such as first solve time, rate of getting incorrect answers, and the difficulty of problems. For many of the features, I could not find great insights, but there are still quite a few interesting graphs in here! I hope you enjoy this data analysis that I've been working on for the last 3 months!

Click to view the story

After you read the link: my personal experience