TLDR;
I built an enterprise grade machine learning model to predict a user rating 6 months in the future. The model has mean absolute error of 65.15 (that is, $$$ E \left [ \left | \hat{x} - x \right | \right ] = 65.15 $$$). You can use it here:
https://fbrunodr.com/predict-codeforces-rating
Motivation:
Check this thread: https://mirror.codeforces.com/blog/entry/143626?#comment-1282206
Before going forward with this post I have to admit a pretty important thing: I did not do what I promised, as I did not build a foundational model on top of codeforces data. Reasons:
Takes to much time to train on my personal laptop (or money to rent gpus, which I am not willing to expend for a toy project).
I still almost went down the path of finetuning some feature extractor model (such as this one), but then I remembered I had to deploy this somewhere. My website runs in a small dedicated server (I don't do serveless to avoid unexpected bills), so running a medium language model there was not a viable option. I also did not feel like renting gpus for that (again because of money).
So I did not strictly build a state of the art machine learning rating predictor model... But I did the next best thing which is: feature engineering + tree decision model. I describe in detail how I did this in the next sections and how you could train an actual foundational model for this task at the end (if you are actually willing to waste time or money on this).
Data collection
This is actually the most important section, as you need lots of data to train a machine learning model. I heavily used the codeforces API for that (even getting IP banned a couple times). Anyway, here is what I did:
- Used https://mirror.codeforces.com/api/user.ratedList?activeOnly=false&includeRetired=false to get non-retired users.



