fbrunodr's blog

By fbrunodr, history, 10 months ago, In English

TLDR;

I built an enterprise grade machine learning model to predict a user rating 6 months in the future. The model has mean absolute error of 65.15 (that is, $$$ E \left [ \left | \hat{x} - x \right | \right ] = 65.15 $$$). You can use it here:

https://fbrunodr.com/predict-codeforces-rating

Motivation:

Check this thread: https://mirror.codeforces.com/blog/entry/143626?#comment-1282206

Before going forward with this post I have to admit a pretty important thing: I did not do what I promised, as I did not build a foundational model on top of codeforces data. Reasons:

  1. Takes to much time to train on my personal laptop (or money to rent gpus, which I am not willing to expend for a toy project).

  2. I still almost went down the path of finetuning some feature extractor model (such as this one), but then I remembered I had to deploy this somewhere. My website runs in a small dedicated server (I don't do serveless to avoid unexpected bills), so running a medium language model there was not a viable option. I also did not feel like renting gpus for that (again because of money).

So I did not strictly build a state of the art machine learning rating predictor model... But I did the next best thing which is: feature engineering + tree decision model. I describe in detail how I did this in the next sections and how you could train an actual foundational model for this task at the end (if you are actually willing to waste time or money on this).

Data collection

This is actually the most important section, as you need lots of data to train a machine learning model. I heavily used the codeforces API for that (even getting IP banned a couple times). Anyway, here is what I did:

  1. Used https://mirror.codeforces.com/api/user.ratedList?activeOnly=false&includeRetired=false to get non-retired users (people who have logged at least once in the past month).

  2. Selected 1.5 k top users + 50 k uniformly random users from the previous step.

  3. Collected submissions data + rating data + blogs data from each selected user and saved all the data.

Just to have an idea, the total data I collected from codeforces adds up to 26.06 GB!! (and I didn't even use all the users).

Label preparation

As I said in the previous section I used a decision tree machine learning model to predict rating. A decision tree model works like this:

$$$ \text{model}: \text{tabular data} \rightarrow \text{prediction} $$$

So we need to get labeled data in tabular format to train the model. For that I did the following:

For each user on each 1st day of each month from Jan 2021 to Nov 2024 we extract features from this user as well as their rating 6 months in the future (30 * 24 * 60 * 60 seconds in the future to be precise).

The extracted features are the tabular data and the rating 6 months in the future is the label. By the way, what are those features? Well, you can set them to any data you want (be aware of data leakage, we don't want the model seeing in the future). For this specific model the chosen features can be seen here. Here are some of the most important features to give you an idea:

  • Number of problems done during a contest in the last 3 months

  • Time since account creation

  • Number of ACs on problems much above current user rating in the last 3 months

  • User's region

  • Delta rating in the past 12 months

Model training

After collecting and preparing all the data it is time to train the model. First I separated the users into training and validation groups, using a 80:20 split. The idea is that we only train on the training users and only evaluate in the validation users. The patterns the model learn in the training group should apply in the validation group, even if the model has never "seen" those users before (which is going to happen in production). We also separate traininig and validation chronologically: we only trained with data from [2021-01-01, 2023-11-01] and only validated on data from [2023-01-01, 2024-11-01]. We do this because we also want the model to be robust to time variation (as the data used on production is going to be at least 6 months more recent than the data used to train, so if the model overfits to a time window it's not going to be so useful in production). Notice I had to cap the validation to 2024-11-01, because 6 months in the future from that is 2025-05-01, which is close to current date (maybe I could have gone a single month more, as I collected this data in June anyway, but whatever, doesn't make that big of a difference), and the labels are always 6 months in the future.

Anyway, after doing all this work (yeah bro, data collection and processing is 90 % of the job) we are finally ready to actually training the model: simply feed (feature, label) pairs to a decision tree model, choose some hyperparameters, choose a loss function and let it do its work. I used catboost by the way, for three reasons:

  • Basically as good as XGBoost and Lightgbm, which are other sota decision tree models.

  • It has support for categorical features out of the box (so I don't have to one-hot encode things, which is annoying).

  • 😸s (yes, I meant cats. This weight a lot when deciding for 😸boost vs lightgbm).

Also I used mean absolute error as loss function, as this is the thing I was trying to optimize in the first place (basically how much the model missed the correct prediction). Finally, after 1 min 15 s of training (yes, that is how fast and cheap decision tree models are) we got a model that predicts a user rating 6 months in the future with 65.15 of mean absolute error! By the way, how good is that?

Caveats on rating prediction

Rating prediction is a fundamentally hard thing. Consider a model was trying to predict my rating 6 months in the future and that happened to be in January 3rd 2025. I had 2029 of rating in this day. So a perfect model would predict this value, right? Consider I run another prediction just one day after, after basically nothing had changed. Now my rating 6 months in the future is 1894. Had the model not changed its prediction (because basically nothing changed) it would have an error of 135, which is the rating I lost on Hello 2025. And it gets worse (or better for me): just 1 week after I recovered most of the lost rating in a single contest... as you can see rating is super volatile (you probably have noticed that by yourself anyway). So a model fundamentally can't get a mean absolute error as close to 0 as one wants (unless it predicts the future perfectly, in which case it would have better uses than predicting codeforces' ratings).

That said, how do we even know if the mean absolute error of our model is good? Is it good because we think an average error of 65 ∆ is acceptable? Did the model even learn anything at all? We need a baseline. One obvious baseline is assuming your rating 6 months in the future is going to be the exact rating you have now. This baseline gives a mean average error of: ... 72.34, which is not that far from our model in the first place. One may ask: is predicting one's rating 6 months in the future with only 7 points better than baseline even good? Well, it's actually hard to tell, because of the reasons outlined in the previous paragraph: one can never get even close to 0 because rating is super volatile. Most people have deltas above 50 in absolute value at some point (even not considering the first 5 contests), which means a model could easily get 50 of error while being almost optimal. Let's say the optimal rating predicting model has mean average loss = $$$L$$$. We already know the baseline is $$$72.34$$$. This means the model can only be $$$72.34 - L$$$ better than the baseline. If $$$L = 40$$$ then our model is far from optimal, as $$$72.34 - 65.15 = 7.19$$$ is a far worse improvement than $$$72.34 - 40 = 32.34$$$. That said, if $$$L = 60$$$ then we have a pretty good model, as far as it is possible to have one. Sadly it's not easy to estimate $$$L$$$, so we won't do that here, I just wanted to add this important observation.

How to improve your rating (according to data science)

Predicting one's rating is not the most fun part of using a machine learning model. I would say the best part is using the patterns the model learned to learn something yourself. For example, most codeforces' users are interested in increasing their rating, but what are people who increase their rating actually doing? Analyzing the shap plot of the trained model we see this:

Shap plot of model

some importat conclusions we can take:

  1. People who have done lot's of problems during rated contests in the last 3 months are the most likely to increase their ratings, while people who are doing almost no contests are are likely to decrease. If you want to increase your rating you have to participate in more contests and this is the single most important factor.

  2. People with new accounts likely haven't converged, so there is still a lot of room for growth; while older accounts usually don't grow as much (completly expected).

  3. The second most important thing you can do is solve problems with rating much above yours (see third feature in the graph). Here much above means $$$ \text{problem rating} - \text{your rating} \gt 400 $$$. Of course solving 3000 rated problems while you are 1200 is not going to help (as you won't understand a thing), but the data suggest solving problems 1700 / 1800 while being 1200 is the second most important thing you can do after contests.

  4. I will comment about region separately.

  5. The 5th thing is a bit confusing. rating_delta_12m is actually $$$\text{rating 12 months ago} - \text{current rating}$$$. This implies that people who had higher rating 12 months ago are likely to keep losing their rating in the next 6 months, while people who improved their rating a lot in the previous 12 months are likely to keep improving. This means competitive programming is like the gym: you have to keep working out to keep your gains.

  6. This one is similar to number 3, but insteado of problems with rating > 400 above your current rating this means $$$ 200 \lt \text{problem rating} - \text{your rating} ≤ 400 $$$. The overral advice here is focusing on doing problems at least 200 above your current rating to improve.

  7. Similar to number 2. People who have done lots of contests have already converged.

  8. This highlights how volatile is rating. If you had much higher rating 5 contests ago you are more likely to have higher rating in the next 6 months. This means you likely lost rating due to volatility and you will get back soon, don't need to worry that much.

  9. Not much to comment.

  10. Same as 8, people who had higher peak are more likely to gain their rating back, as rating is highly volatile.

Point 5 is kinda contradictory with points 8 and 10, but I think they play distinct roles: if you used to be much better in the past you can easily improve back again, but chances are you will just keep losing rating slowly if you don't take action (again like the gym, it's easier to put muscle back than put for the first time, but you will keep losing if you don't do anything).

I believe you are smart and you can analyze the remaining for yourself, but be aware they get less and less important down the list. Before moving on, why is region so high? After some data engineering I separated the countries according to their participants competitive programming performance and cultural similarity (or whatever was going in my mind, this was not exactly scientific and I asked chatgpt to do most of the work). The mapping between country and region can be found here. At the end we can see how each region influences your predicted delta rating in 6 months:

Shap values of regions

First of all, sorry if this offended anyone, it was not my intention. Second: people from countries with lots of good competitors (Super nerds is China, USA, Japan, South Korea, Russian, Hong kong, Taiwan and Singapore) have only a slight advantage over improving their rating compared to others. In the end the only real difference is whether you set or not your country in codeforces (I believe this is kinda of an engagement thing — if you engage in codeforces you are more likely to keep improving) (mind you correlation ≠ causation).

Training a foundational model to get SOTA performance

Finally the thing I promised and did not deliver. How could you build a foundational model over codeforces data to train a state of the art rating prediction model? First you must collect all the available data from a lot of users (like I did and described in the data collection section). From each user order their data by time and train a language model to predict what event the user will do next on this data. The idea here is that the model will predict the next event relatively well iff the model has some understanding of what is actually going on (think of that like a world model of what happens in codeforces). If the model "understands what is going on" the embeddings generated by that model over some user's data hold some deep information about the user. Those embeddings can be fed to a decision tree model to predict a user's future rating. The pipeline looks like this:

$$$ \text{user's data} \xrightarrow{\text{foundational model}} \text{embeddings} \xrightarrow{\text{decision tree}} \text{final prediction} $$$

Notice that the embeddings generated by the foundational model can be used in a lot of distinct downstream tasks (downstream task is basically another task that takes the input of this one, like the flow of a river). That is why those models are called foundational: they hold some deeper level understading of the data that can be used in lot's of distinct tasks.

Anyway, I didn't do that because of two details: time and money 😅. The main disadvantage of this method is that it's hard to draw the conclusions we did on the section "How to improve your rating" and do model explainability. The model could be severely biased and you will never know. The main advantage is state of the art performance + no feature engineering.

Anyway, hope you guys liked the blog. Comment what rating the model predicted in the future so we can check later if it is doing well. Hope the new wave of geniuses that came with O3 pro is not going to impact the ratings a lot.

Update: added used data to kaggle (so you don't have to abuse codeforces' servers like I did): https://www.kaggle.com/datasets/fbrunodr/codeforces-users-data

Full text and comments »

  • Vote: I like it
  • +165
  • Vote: I do not like it

By fbrunodr, history, 12 months ago, In English

Problem E appeared on Honk Kong online judge

Honk Kong Online Judge

Although the intended solution is O(n^2), the editorial provides a O(n) solution, as can be seen here: 2022 Mini Competition 3 Editorial

You can't have access to this site unless someone from your school paid the Hong Kong judge and registered you.

Problem H appeared ipsis litteris on a Japanese online judge

This one has already been discussed by the contest author: https://mirror.codeforces.com/blog/entry/142069.

Imagine if a Brazilian made the hardest problem equal to a problem only available on a Brazilian online judge (which is also Portuguese only) and problem E appeared on a Latin American training list. This would hugely favor Brazilians / latin Americans and be unfair to everyone else. Well, if you guys decide to make this contest rated, don't do different if the scenario I just described happens.

Full text and comments »

  • Vote: I like it
  • -75
  • Vote: I do not like it

By fbrunodr, history, 12 months ago, In English

It's a bright, cheerful day in the whimsical town of Permexorpia. This peculiar place has a tradition that has never been seen before: friends exchange elaborate mathematical objects as tokens of affection. Today, we witness our dear friend Alice meticulously preparing a special present for her companion, Bob. Knowing Bob’s fondness for rare types of sequences, Alice has crafted an incredibly thoughtful and unique gift—a beautifully wrapped permutation.

In Permexorpia, a permutation of size $$$n$$$ is a list containing each of the numbers $$$0, 1, 2, ..., n-1$$$ exactly once (notice the list contains $$$0$$$ and does not contain $$$n$$$).

Upon receiving this extravagant gift, Bob, who is an avid enthusiast of unusual mathematical operations, decides to compute something truly intriguing: the XOR of the MEX of every single possible continuous subarray† of the permutation given by Alice. More formally, let $$$f(p)$$$ be the value calculated by Bob for a given permutation $$$p$$$. We have:

$$$ f(p) = \bigoplus_{1 \leq i \leq j \leq n} \text{mex}(p[i, j]) $$$

Where $$$p[i,j]$$$ denotes the subarray $$$[p_i, p_{i+1}, p_{i+2} ..., p_{j-1}, p_j]$$$, mex denotes the Minimum Excluded value and $$$\oplus$$$ denotes the XOR operation.

† A continuous subarray of an array $$$a$$$ is an array obtained by deletion of several (possibly zero) elements from the beginning of $$$a$$$ and several (possibly zero) elements from the end of $$$a$$$, while keeping the remaining elements in the same order.

Input

The first line contains an integer $$$t$$$ – the number of independent test cases ($$$1 \leq t \leq 10^4$$$).

Each test case consists of two lines. The first line of each test case contains an integer $$$ n $$$, the size of the permutation $$$p$$$.

The following line contains $$$n$$$ distinct integers $$$p_1, p_2, \dots, p_n$$$, each integer raging from $$$0$$$ to $$$n - 1$$$.

It is guaranteed that the sum of $$$n$$$ does not exceed $$$5 \cdot 10^5$$$ over all test cases.

Output

Output a single integer — the value of $$$f(p)$$$, the XOR of the MEX values of all possible continuous subarrays of the given permutation.

Example

Input

3
3
1 0 2
4
0 1 3 2
1
0

Output

1
5
1

Note

Recall that the MEX (Minimum EXcluded value) of a sequence is the smallest non-negative integer that does not appear in the sequence. For instance:

  • The MEX of [0, 1, 2] is 3.

  • The MEX of [1, 2] is 0.

  • The MEX of [0, 2, 3] is 1.

Full text and comments »

  • Vote: I like it
  • +43
  • Vote: I do not like it

By fbrunodr, history, 12 months ago, In English

I did particularly bad in today’s contest (considering my goal of achieving Master). Then I took a look at the scoreboard and noticed a guy a bit above me didn’t pass problem B on system testing. I got curious and took a look at his submission for C:

https://mirror.codeforces.com/contest/2084/submission/314079938

Losing a bunch of rating because you did bad is frustrating, but nothing tops being behind the “cheater’s wall”. What exactly is the “cheater’s wall”? Suppose some LLM can solve A-D. Then all the cheaters are going to solve A-D super fast. For example, consider the edu contest just before this one. I solved A-E and got 97th place. Had I not passed E I would have placed close to 1000th, which is a huge difference in performance (basically around 1850 to around 2350). Also if you look at the scoreboard, a bunch of guys did A-D super fast (like 20 min) but were still unable to solve E in 1hr 40 min. To become master I must consistently beat this cheater’s wall. One bad contest and I get -100 delta for being behind it. I can’t even imagine how frustrating it must be for people below CM to be competing in 2025: you can improve a lot and see little rating improvement because the cheaters consistently beat you. Also the amount of new accounts cheating may cause rating deflation, as cheaters will start with 1500 hidden rating and perform like 1900. That said, what is even the point of doing codeforces if you are below CM nowadays. You can’t beat the cheaters wall so you might as well compete anywhere else, train on other sites, etc, instead of fighting a lost battle.

Full text and comments »

  • Vote: I like it
  • +149
  • Vote: I do not like it

By fbrunodr, history, 15 months ago, In English
  • Top codeforces user: jiangly
  • Top all time rating: jiangly
  • 34 Chinese people in top 100 users. If you think those are outliers that come from a few very good schools, today contest (which was the memorable contest #1000) was a div. 2 contest and had at least 35 Chinese people in the top 100 of the official standings, strongly suggesting that the Chinese domination on codeforces is not simply due to some outliers.
  • DeepSeeking R1 showing that China is in the forefront of AI, competing directly with ClosedAI and in front of Big Tech Companies like Meta and Google.
  • World largest semiconductor manufacturer in Taiwan.

I have a strongly feeling that Silicon Valley technology domination is coming to an end and moving to China. Notice I am not even considering people of Chinese ascent who are American.

What do you guys think about this? Why is China so successful at programming compared to rest of the world?

Full text and comments »

  • Vote: I like it
  • -50
  • Vote: I do not like it

By fbrunodr, history, 16 months ago, In English

I got this message today from system:

Attention! Your solution 295598611 for the problem 2040C significantly coincides with solutions fbrunodr/295598611, gunjackk28/295637944. Such a coincidence is a clear rules violation. Note that unintentional leakage is also a violation. For example, do not use ideone.com with the default settings (public access to your code). If you have conclusive evidence that a coincidence has occurred due to the use of a common source published before the competition, write a comment to post about the round with all the details. More information can be found at http://mirror.codeforces.com/blog/entry/8790. Such violation of the rules may be the reason for blocking your account or other penalties. In case of repeated violations, your account may be blocked.

I use my personal laptop to code (so this is not unintentional leakage) and haven't provided my source code to anyone. The only thing I can think of is that someone abused the hacking feature of codeforces to leak my source code to a friend (as leaking their own source code would be to obvious, right?). The user who copied my code is SLgoodman from India. He submitted the problem in the very last minute of the contest (01:59:05). Checking my room in the contest, only 3 other people got AC on the copied problem (problem C); one of them, HardikChoudhary, is also from India and got AC at (01:55:53), just 3 minutes before SLgoodman submitted an exact copy of my source code. I think this is highly suspicious and should be investigated.

I never violated a single rule in codeforces and now I am being threatened to be blocked/banned because someone else cheated, this is completely unfair. I think the best solution in this case would be to remove the hacking during contest feature, as has already been suggested in another post.


Just before someone suggests it might be a coincidence, it is impossible in this case, as I use a highly specific .cpp file as template (which can be found in my GitHub). You can check the submissions here and see that there is no explanation other than the guy simply copied my source code:

My submission: https://mirror.codeforces.com/contest/2040/submission/295598611

SLgoodman submission: https://mirror.codeforces.com/contest/2040/submission/295637944

Edit. 1: As some people have commented in this thread, it is impossible for HardikChoudhary to have leaked my code, as he did not lock problem C. That said I have no idea how this could have happened. I have already changed my password on Codeforces and I will just pray this won't repeat. I don't have any extensions set on my browser and all the extensions on my VScode are safe (either from Microsoft or with 1M+ downloads).

Edit. 2: This is especially disheartening was this was the first contest done on my new laptop (M3 MacBook) :(

Full text and comments »

  • Vote: I like it
  • +28
  • Vote: I do not like it

By fbrunodr, history, 16 months ago, In English

Problem to be addressed

On Linux it is pretty straightforward to set up the debugger for C++ on VSCode. The workflow is:

  • Write your code

  • Go to the [Run and Debug] VSCode section

  • Click on run active file or something similar

  • Debug your code

On macOS you will run into a lot of silly problems, such as:

  • Can't find #include<bits/stdc++.h>

  • Installed g++ and got the debugger running, buck can't write input to std::cin in the integrated terminal, because it says

Build finished successfully.
 *  Terminal will be reused by tasks, press any key to close it. 

and then closes the terminal, making it impossible to write to std::cin.

After some time browsing the internet I found the solution for all those problems and got my setup working just like on linux.

Solution

Let's brake the solution step by step, from a almost brand new macOS.

  1. Install Homebrew

  2. Install g++ (brew install gcc)

  3. Download VSCode

  4. Install C++ extension on VSCode

  5. On VSCode type ⌘+shift+p to open up the command palette. Then write C/C++ Edit Configurations (UI). A window will open up, then you must set Compiler path to /opt/homebrew/bin/g++-14 (or whatever version is installed; you can see the files inside the dir /opt/homebrew/bin/ by running ls /opt/homebrew/bin/ on your terminal and from there you can figure it out yourself). Still on this window set C++ standard to c++23 for latest features.

  6. Install extension CodeLLDB.

Okay, now let's say you created a /codeforces directory on your home, where you will run some c++ files. Inside this directory create a .vscode directory if one does not exist already. Then create a tasks.json file and put this code inside it:

(Obs.: change the multiple dollar signs to single ones. Codeforces' parser messes this up)

{
    "tasks": [
        {
            "type": "cppbuild",
            "label": "C/C++: g++-14 build active file",
            "command": "/opt/homebrew/bin/g++-14",
            "args": [
                "-fdiagnostics-color=always",
                "-g",
                "${file}",
                "-o",
                "$$${fileDirname}/$$${fileBasenameNoExtension}"
            ],
            "options": {
                "cwd": "${fileDirname}"
            },
            "problemMatcher": [
                "$gcc"
            ],
            "group": {
                "kind": "build",
                "isDefault": true
            },
            "detail": "Task generated by Debugger."
        }
    ],
    "version": "2.0.0"
}

The only purpose of this task called "C/C++: g++-14 build active file" is to build the c++ code in the current active file (as the name suggests).

In the same .vscode dir create another file called launch.json and write this code inside it:

(Obs.: change the multiple dollar signs to single ones. Codeforces' parser messes this up)

{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "type": "lldb",
            "request": "launch",
            "name": "Debug C++",
            "program": "$$${fileDirname}/$$${fileBasenameNoExtension}",
            "args": [],
            "cwd": "${fileDirname}",
            "preLaunchTask": "C/C++: g++-14 build active file",
        }
    ]
}

This prelaunch task is necessary, so it compiles the code before running it, otherwise you will have to compile and then run the debugger every single time, which is annoying and error prone. Also mind you that the name of the preLaunchTask is exactly the same as the label of the task we created previously, so be mindful if you change names.

After all that setup you should be able to go to your [Run and Debug] VSCode section and choose Debug C++, which will behave exactly like it did on linux.

If you have any questions feel free to ask!

Full text and comments »

  • Vote: I like it
  • 0
  • Vote: I do not like it