Hello everyone.
In the coming days I will finish the implementation of small changes in the rating calculation for new accounts. Here are the main innovations.
- The rating of the new account will be equal to $$$0$$$ for display (but it will be equal to $$$1400$$$ when calculating rating changes).
- Suppose, after the first round, the participant gained $$$d_1$$$ rating points (remember that the rating was considered equal to $$$1400$$$ in such calculations), then in the rating display after this round $$$500+d_1$$$ will be displayed. Thus, after the first participation, the rating is likely to increase from $$$0$$$ to a value in the region of $$$500$$$ (plus or minus $$$300$$$ approximately).
- Thus, before the second participation, the displayed rating is $$$500+d_1$$$, and the rating for calculating changes is $$$1400+d_1$$$. Suppose a new change is $$$d_2$$$, then the displayed rating becomes $$$500+d_1+350+d_2$$$.
- Thus, before the third participation, the displayed rating is $$$500+d_1+350+d_2$$$, and the rating for calculating changes is $$$1400+d_1+d_2$$$. Suppose a new change is $$$d_3$$$, then the displayed rating becomes $$$500+d_1+350+d_2+250+d_3$$$.
- And so on the first $$$6$$$ rounds. Promotions of the displayed rating will be equal to $$$500, 350, 250, 150, 100, 50$$$ (in total exactly $$$1400$$$). Therefore, after participating in $$$6$$$ rounds, the rating is “stabilized” and the displayed rating will be equal to the one used to recalculate the changes.
Thus, on average, for new accounts at an early stage, the rating will increase, getting closer to the real value. These innovations will help solve several issues:
- We are reducing the starting rating from $$$1500 $$$ to $$$ 1400$$$, so that new accounts make a smaller contribution to the overall rating.
- Currently, especially for inexperienced participants, there is an effect that their rating at the beginning of participation is constantly falling (from $$$1500$$$ to the value that is a true assessment of their skills). This, of course, leads to discomfort: you try, participate, solve some problems, but the rating drops.
- The previous paragraph gives additional motivation to create new accounts. If your current rating is less than $$$1500$$$, then there is a temptation to start from scratch with a fresh account. After the changes, such a start will require a noticeable long-term effort to climb at least your current rating. It seems that the changes should slightly reduce the motivation to create new accounts.
Apparently, a similar idea is used on AtCoder, but I don't know the details.
What do you think about such improvement? Sounds good?
A nice step since new participant's ratings constantly drop.
The idea is great, atcoder has been successful in reducing multiple accounts. Great initiative.
Seems like a great update! Will the rating field in the CF API display the visible rating or the backend one used to calculate changes? I’m curious about whether this will break rating change prediction tools until their creators update them to account for this change.
Additionally, will the visible rating or the true rating be used to evaluate eligibility for Div 2/3/4 rounds? For example, suppose a new user jumps to 1700 true rating (thus, 800 display rating) after one round: can they participate officially in a Div. 3 round afterwards?
I don't know how but It'll be better to have just one rating.
API will contain displayed ratings. Probably, good idea is to add one more field (say,
trueRating
) to API.I think divisions should be assigned based on displayed ratings. It will be completely non-transparent to assign division by true ratings but show displayed ratings. I agree that it will make a few cases then a participant takes part in Div3 but shouldn't because of true rating. But I think it will not be not very often cases and "trusted participants" policy will hide them in most cases.
Do you plan on somehow gating the rating gains when true rating is higher than division cutoff? It seems to me like someone can reach 2100/2200 by only participating in div4/div3 contests.
Waiting for "How I went from 0 to 3000 in just 2 weeks".
What's happened? Vepifanov’s rating.
Any news about the API and the
trueRating
field? It would be very useful for many extensions, etc...Sorry to bump this again, but any chance you could comment on the state of the
trueRating
field? As is, it's pretty difficult for rating predictor extensions, which are used by a significant portion of the community, to accurately evaluate projected rating changes.MikeMirzayanov Sorry for necroposting, but is true rating going to ever come to Codeforces API? As prediction extensions can't provide accurate rating changes without this field, it's a very useful change
It should be a quite easy fix, I'm going to update CF-Predictor as soon as we got the "trueRating" field in the API response.
Good to know, many thanks for maintaining this tool!
I don't think that is sufficient because now the displayed rating depends on both true rating and number of past rated contests.
It is not convenient to get the second from the API for users in bulk.
Edit: The number of past rated contests can be calculated from the difference between real and displayed ratings, so it would, in fact, be sufficient.
If new participants are discouraged in the starting due to lower rating points, then if they get negative delta in their first few contests then their rating will be <500, which will be much more discouraging.
I suggest initial rating to 500, in practice as well as during calculations. Instead lets balance rating inflation/deflation, so that that for the first few contests rating inflation/deflation is low, and then it gradually hits a stable value with more contests. I hope you get the general idea.
Bruh 500 is way too low. Some people started on cf after already having experience, and at least I don't wanna do 10 contests just to reach my actual starting rating. Even people with 0 cp experience but have programming experience are higher usually much higher than 500.
Also, I think it discourages you to care if you start with a super low rating too, like on atcoder I tend to not care about the contests at all since I haven't done many so my rating usually goes up no matter how much I mess around.
Yeah, I get your point. But you will also see that there are very less fake/alt accounts on AtCoder.
Anyways your arguments seem fair too, but there will be some tradeoff :/
But I think your rating will remain low for 6 rounds only. 500, 350, 250, 150, ... will be added to each round for free. New people will have to tolerate lowrating for 6 rounds only.
I am not arguing against mike's system, as it combats this by having the calculation rating be higher, so it quickly converges. I was arguing against having the calculation rating also starting at 500, as was suggested.
No it will be 500 + d1. So it will appear to be near 500. Of course unparticipated will be 0.
Did you even read it...
The rating of the new account will be equal to 0 for display (but it will be equal to 1400 when calculating rating changes)
It is a good idea indeed..
Create a new account and vote again
There must be some sort of glitch, I can't see your comment!
Make a new account and see again
While reading the rules I thought I am reading some codeforces A problem statement and I need to answer the final rating of a newly registered account.
And 251 is the answer
What about the old users who had participated in less than six contests but at least one contest participation? Do their rating change?
I think if a participant is not rated yet, new rules will be applied. If a participant took part in at least one rated contest, old rules are applied.
Thanks Mike For Clarifiaction
You can see my rating graph, it is a clear example of harsh rating losses for newcomers. It was a palindrome at the beginning!
What if you see my rating graph?
I can see the pain of struggler from your graph....
Hi
Same with my graph
Same with me LMAO.
Check mine...
(deleted)
Really happy to see this, you're right, I was also tempted to make another account, but I didn't lose hope and today I am 1500+ :D
This should have happened a long time ago!
Codeforces should look to Atcoder for inspiration for its rating system.
Connect the codeforces account with respective country's UID (Unique Identity) Number. Like there is AADHAR NUMBER in India.This might reduce number of fake accounts.
I'd say this is a little bit dystopian.
Yes force each account to also provide a credit card/bank account number to Mike to "guarantee no fake accounts".
Yes force each account to also pay money to Mike to "guarantee no fake accounts"XD
Strongly disagree....I don't think that it will be a good idea because many students don't have any credit card and a need to add credit/debit card will decrease the number of new comers too.Because by adding credit card system you are demotivating many new comers even to get an experience of competitive programming
it is a joke
Inb4 your subsidy got credited into Putin's swiss account.
I am 15 so I don't have a UID. Literally, a lot of OIers don't have UID. So you force all OIers not to participate and practice on codeforces? That is very rude!
Yeah, I kinda agree with that.
First of all, thanks for this step !!
I have a question in mind . What is the approximately maximum value of d1+500 + d2+ 350 + ...+ d6+ 50. If this value is always less than 1400 then there is no problem .
But , Suppose this value exceeds 1400 in the 4th round ( d1+d2+...d4) . and suppose the participant's rating is 1500 . Then still he will be treated as 1400 rated and will cause serious rating instability.
I think maybe they will be rated according to 1400+d1+d2.. rating and not their shown rating until 6 rounds
What about graph of rating?
They will show displayed ratings. So for new participants most graphs will show increasing at the start, generally.
After these changes are done.
It would make sense to research prior state of the art before creating one's own implementation
i think he probably did and is just saying that so it doesn't count as copying lol.
Low novelty, doesn't compare to previous state of the art baselines, strong reject.
Will that new person participate at rated contests based on his/her displayed rating or based on the real rating?
If the former happens, one can get a really high rating from participating at Div3 and Div2, which is not fair.
It seems (from Mike's response to my message above) that eligibility will be determined by displayed rating.
So, this is exactly what may cause the problem. So if you have big real rating and small displayed rating, if you win, you'll get your real rating change and additional boost to your displayed rating, so you will get higher then beofre
Suggestion: is it also may be possible to make the first few contests(say 3) unrated for the new users. This will help the new users to actually get accustomed to the style of contests here and will decrease the number of fake accounts
suppose a smart person decided to join CF, gets top ranks, sees that he needs to write 3 unrated contests to get rating .he would never come back
Virtual contests...?
like div1 + div2 rounds will we see div2+div3 or div3+div4 in the future because it looks like in future div4 contests frequency is going to increase?
Can anyone briefly explain what Mike has written? I'm too stupid to understand the technical intricacies involved
every contest you gain a little bias (and the sum of those biases over 6 contests is 1400), but you participate as a 1400 anyways. essentially after 6 rounds it acts like you started at 1400 rating and all rating fluctuations from the past 6 contests (that act like you had 1400 to begin with) apply after that too
You need to take a third grade math class first.
The upcoming Div.2 round is the last chance for us, new reds to create as many alts as we can in order to take part in the following Div.1 contests with ease
orz
+1, about 2-3 weeks and no D1 ╯︿╰
There will be one absurd thing about this though, if you don't solve any problem in your first contest, your display rating will still increase by around 350. I believe that would look absurd.
I believe it would make more sense to keep the display rating locked at 0 until a user has solved at least one problem in an official round (and then add all of the pending "promotions" in rating at once or maybe delay the promotions).
I don't really see the problem here.
If you don't attempt any problem, it doesn't count as a rated participation anyway.
If you do attempt and don't manage to solve anything, you'll likely get a huge negative delta, which will balance out the displayed rating bias increase.
I recall how in the past it wasn't unheard of to have your TopCoder rating increased after scoring 0 :)
While the situation that you described may not sound right intuitively, I don't see a good solution that wouldn't add unnecessary complexity to the system.
I think 6 contests to get to your true rating sounds fair, and more is definitely unnecessary. AtCoder's system is a bit unfortunate in that the rating seems to climb really slowly, which means mine still hasn't converged after 13 contests (I'm 100 points lower than my contest performance would indicate). It's mildly annoying and I'd prefer to avoid that.
Just for fun, last week I made a comparison of rating vs. percentile across different competition platforms. Notice how AtCoder's median is 165 (yes, mid-grey) which doesn't look reasonable to me.
(X-axis is exponential, the data might not be 100% accurate, CodeChef includes all users and not only active).
But I suggest the beginning imaginary rating be 1500 instead of 1400. So as you said new accounts will contribute less to rating. The existing specialists and experts will find more difficult to gain rating. So someone who got 1700 before the start of this rule got is much easier than someone will now.
Will this be also applied for those who already registered but didn't participate in more than six contests? Or will it be applicable for only those who will create an account after the implementation of this?
Your question is answered here.
Rating is just a number. The main raiting — is your knowledge. (but you still should not look at my rating history)
How it's done on AtCoder:
$$$rating_{shown} = rating_{true} - f(competitions)$$$, where $$$f(x)$$$ is a magic function that starts at $$$1200$$$ and tends to $$$0$$$ as $$$x$$$ tends to infinty.
What you did is equivalent, but with $$$f(1) = 900$$$, $$$f(2) = 550$$$ ... $$$f(7) = 0$$$.
Is that magic function not disclosed?
It is, follow the link
AtCoder's Rating System
on atcoder.jp to find all the details.@Mike, I liked your idea of calculating the rating. One thing that I would like to share with you guys — Like if a guy has a rating of 1599 and participates in Div 3, goes to 1700+ , guy with 1600 can also do the same(not comparing any one). Can this be taken into picture "IF YOU CROSS A DIV — CAN IT BE LIKE Min(DIV X + 50, new Rating). Ps — Just a THOUGHT. (No personal issues).
We may see marked increase in participation in the upcoming Div2 round as people who are confident enough might not want to wait for 5-6 round to get to their deserved rating. Also better start at 1500 than 1400 if you feel you stand somewhere in between.
Great initiative!
This may be quite unrelated to this post, but I think this adds another reason to suggest this now.
I think we need an up-to-date version of this 9-year old FAQ. A big portion of this post contains old information which doesn't correspond to how the system works nowadays. It was already too much out of date when I first joined CF, and now it has even more complicated features like Div. 3 & 4 and this modified rating system. Even though I really like this new rating system in general, this may make newcomers a lot more confused about their rating changes if they can't find this information easily.
Maybe it would make more sense (in that the system might be cleaner) to rethink the entire rating system (that includes round divisions etc) rather than to add features to the existing system.
I think any radical change will make many veterans unhappy.
It is really good idea.It would definitely stop multiple accounts formation and also would be nice to boost up confidence of new participant
If i have less than 1600 rating, but more than 1600 true rating (1400 + $$$d_1$$$ + $$$d_2$$$ + ...) . Is div3 rated for me?
So basically now you can get to master just by hitting top scores on 3 div3s in a row. Or maybe div4s (why on earth is that even a thing when it was what div3 was meant to be). The whole system just keeps getting more and more broken lol.
If someones actual rating becomes more than 1600. Just give him his actual rating instead of waiting for 6 contests. The current system will just increase inflation with 2200 and 2300 users/alts participating in div2s. Becoming GMs using div2s only.
Very good step.
Assigning contests based on displayed rating? This is opening up a whole new can of worms here. If it ain't broke, don't fix it.
It sounds perfectly correct . This will definitely demotivate to create new accounts as they create when their ratings drops so much. Best Luck to participants for the next contest rounds. Wish all high rating!
Yeah but can you wait for like 3 more div 2 contests ? Gotta farm those filthy rating and get yellow before deflation hits.
Oh,no new accounts become Master in just 3 rounds then.
I don't know if it's a good thing,in my opinion it's more likely to be,but time will tell.
Hope high rating for new comers
Flakire can get Master in only 2 rounds. So orz Flakire.
In fact,you can find a legendary grandmaster in 10 rounds.
Just look at zyy. @WZYYN
Apparently, a similar idea is used on AtCoder, but I don't know the details.
That's a great idea to give the new accounts more confidence. For me, it's really terrible if you work really hard but your rating just drops.
I like the way it works in AtCoder, there my rating went down the first time after 7 contests. Assuming it will work here similar it is a good change.
Still the problem that top 10 of Div3 contests make a huge jump in rating, but that is another story.
And I very much apreciate that new accounts become less atractive.
Hey..i am a school student and i just got introduced to codeforces. i took the above Div 3 challenge yesterday and managed to solve 2 problems, post that my laptop got shut unfortunately. However, both the submitted questions were accepted in the first test as well as post the open hack and system test session. But i haven't got any updates regarding my performance or ratings. I am new to this. Please tell me what am i supposed to do?
Just be patient, the ratings will be updated soon. It takes some time.
If you can't be patient, and are using Chrome, add the Codeforces Enhancer to Chrome. It will (among other things), give you an estimate of your rating change in the standings table.
People with new account are becoming master and candidate master. That does not seems to be fair imo. This could promote people to make new account.
Where can I get the comprehensive information about how the ratings are calculated in CodeForces by the way?
MikeMirzayanov I think there are few bugs in the new rating system as many people who gave their first contest yesterday have become candidate master with just 2-3 problems solved.
Now, for some reason the initial ratings are set to be zero. The starters should start with a rating of 1400, right?
As per this blog you initial rating will start from 1400 but you will see it from 0. It's explained properly in this blog i think.
Yes, you are right, I misunderstood it. Thanks.
I have observed that problems in CF Round #644 (Div 3) were quite easy. Except the last, all problems had 2000+ successful submissions during the contest. Due to this, the rating of participant in 1400-1600 exploded. Adding more to this, new accounts became masters just by solving 2 problems last night. LOLzz.
This might be due to new rating system. However, this completely destroys the motive of the new rating mechanism, that was to reduce creation of new accounts.
I propose two points after seeing what happened:
Withdraw the new rating system.
Make prev. round unrated for participants who were in 1400-1600 before the contest. As the problems were indeed suitable for Div 4 contests only.
Comment if you agree/disagree with my proposal.
Maybe it's a good choice, but we may need time to adapt to it
bruh wtf im mad bro sir i lost my exopert im sad bro
Mac I schadenfreude in your face.
MikeMirzayanov I think there are still a few problems look at Droping_points rating. He/sh's solved all problems but the rating was not enough added.
In the past, someone always try to droping points and then get high rank in div2/div3 to get high rating,
maybe the rating they get is not as higher as usuall, but I think it's good to increase single score growth, maybe 500 or 400 ? And maybe it's good to make different rating has different maximum score growth, maybe we should make someone who has high level to reach the rating they worth soon. In Atcoder you can get more than 1000 points in your first contest
It appears to me that rating changes for the users who were rated before the previous Div3 round have been a bit unfair,as is evident in the case of Droping_points . @mikemirzayanov it would be better if you can make the rating calculation rules a bit more clear..as apparently someone with a negative initial rating before the round got a delta of +300 despite finishing 3rd in the standings table whereas some unrated user who came last in the standings table also got a delta of +300,which seems a bit harsh to me.
I have given seven contests till date and in each one my rating has only fallen, even though ive solved at least two problems in each contest(apart from one). Its quite disencouraging tbh. Still waiting for even a slight increase in my rating.
I'm a bit confused with the rules apparently. Will the ratings for other users decrease if starting a new calculation from zero?
It starts from 1400, but yes it will cause a decrease in rating for others
Ah... I don't really like this...
I think with the fast growing number of users, even 1400 will still make the rating inflated at a super high rate. As for most newbies, I think a starting number of 1000 or less is more appropriate(It doesn't affect the rating showed anyway)
Guys I was curious about one thing, but I don't want to spam Codeforces with a new blog so asking here... I am seeing some high-rated guys from my country being online (almost) continuously. They aren't submitting anything most of the time, not even in practice(except in a few days). So what are they doing? Is this some kind of browser connection bug? I have no interest in their life, just want to satisfy my curiosity
It's possible that they are doing mashup contests, and since they are private others can't see their submissions
Just remove div3 and div4.
Lmao he speaks the truth here.
I wish Codeforces's own formula.
I like Codeforce the most. So I disagree with following someone else's formula.
There is no need to change the formula.
Secretly adding negative bias to the rating balance every contest.
What is the point of this update? I mean, as far as I remember, CF uses some kind of Elo rating system (one of Elo, Glicko, Glicko-2, etc.), where the starting rating is just an additive constant which does not affect any computations (just because 1500 rating looks prettier than 0). So, in fact, this update adds 100 additional points to all old profiles, and therefore breaks whole rating system, because these numbers mean nothing now (literally nothing, +100 points for reds is not the same as +100 points for blues). And moreover, it does not even solve the problem of one-contest profiles, just moves it 100 points lower.
Btw, AtCoder uses its own crazy rating system, which is definitely not from Elo family, so everything above does not apply to it.
Besides the implementation, I agree that current rating system is not the best for CF purposes and maybe should be changed, but it requires a lot more research.
The starting rating matters, because it influences whether the total rating mass of active users is decreasing or increasing. Thus, lowering the starting rating is a step to combat rating inflation.
(If most of the new accounts are overvalued in rating compared to the old accounts, then the new equilibrium is with higher old ratings.)
No, it does not. That is how Elo works. -100 to all new users is EQUIVALENT to +100 to all old users. If you decrease starting rating, then all the ratings are decreased by the the same constant. If you decrease starting rating not for all users, then all the ratings are still decreased, but now it is unconverged. Just because in Elo starting rating stands for average player rating. 1500 is just a some nice number which is used as a legacy.
With the same success we could change factor 400 (stands for variance) in recalculation formulas. It won't change anything. Moreover, after some time, when sufficient amount of 1400 users are created, we will have the same problems, but -100 rating.
As a direct consequence, new rating numbers will mean nothing. For old rating I could say that 2300 player should win 2100 with probability 75% and that 1600 player is roughly top33%, but now it is impossible. It destroys all the good properties of Elo!
That's why I think that 1400 is a stupid idea.
P.S. The second reason to hate CF (after the killing of the rating system) is the button "Cancel" in the usual place of the "Send" button
What? One of us is misunderstanding the new rating update. I thought the update is essentially just making the new starting rating 1400 instead of 1500.
The accepted answer here says something about the effect of starting rating values...
Yes, I also think that the update changes starting rating to 1400 instead of 1500 (for new users).
The accepted answer also correct and perfect. But, what the answer says is two things:
When you add something to starting rating globally (for all players, not our case), it just adds this constant to all ratings and does not affect anything else.
When you merge ratings of skilled group of players with average players, it is a good idea to add some constant to the ratings of skilled group to make Elo converge faster.
Second one is also not our case, since there no reasons why users who registered after May 25 should be less skilled than those, who done it before. The idea of 1400 update, as I understand, to deal with rating inflation caused by one-contest players, but it is the exactly the case where it won't help. Unless we repeat it every year. Then probably it will help to maintain reasonable number of LGMs without changing division borders. But in this case we will be unable to estimate anything from this rating numbers (e.g. winning probability), which is worse IMO.
The better way to deal with the same problem is, e.g. to calculate rating updates using only the players with at least x contests.
This is wrong, in general. It is true only in a special case, where ratio old users / new users is close to 0. That is much more new users came, than was in the pool.
Actually the rating assigned to new users should be a good guess of their average rating. If you guess it well, you will notice no inflation and no deflation. That's basically what dpaleka said.
Let's have a look at a basic example, that should make all clear. At least for me it did :)
We have 3 players, A1 with rating 220 and B2, B3 — rated 100. Their average rating is 140 (420 / 3). Now let's assume a new user (which is of skill level B), and we assign him incorrect value 140. What will happen after longer period:
A1 = 230, B2 = B3 = B4 = 110. Their sum is 560 (420 + 140), and difference between A and B is 120, as before. This is because of ELO properties: the sum must stay the same and the difference in rating reflects the probability of A winning with B.
So what has happened? Our mistake in judgement (40 rating points) has been evenly distributed between all players, resulting in inflation of 10 points.
So it's hard to judge whether 1400 is a good change. If CF owners evaluated all new users in last 6 months and noticed that their average rating after 6 contests is equal to 1400 — then it's great. It will mean stable rating system as long, as average rating of newcomers stays at 1400. The greater the mistake in judgement, the greater inflation/deflation will occur.
When I started comp. programming I suffered from the same issue. I feared my rating would drop and didn't participate in contest, rather solved problems later. Although, I now understand that rating is just a depiction of your skills and their was no point of doing what I did, I really appreciate this step and it will be really in making sure that the ultra-beginners don't get demotivated in the starting itself.
Whats wrong with ur graph ? https://i.ibb.co/xXLvvPQ/Capture.png
I was trying to figure out the same, That's happening with my account only :(
mine is broken too.....
Just do what we have in chess. No rating for first 10 contests, and after the 10th contest you get your rating = average perfomance in 10 contests.
Before that, when N < 10 contests have been played, show approximate rating = average perfomance in N contests in the profile.
Perfomance is "if you had this rating, it wouldn't change after the contest".
So if i get the first place in some contest, my performance will approach infinite?
Guys, we need to update the СF predictor, because it will show the wrong rating.
Where have the comments gone?
_BeginnerCoder replied to your comment
Let me restate.
The following comment of _BeginnerCoder — Can anyone briefly explain what Mike has written? I'm too stupid to understand the technical intricacies involved — has disappeared along with all three of its descendants. Does anybody know why?
Yeah, it was deleted by Codeforces. Now we both will get some notifications that'll include a warning to BNBR else face account suspension. Walking on thin ice I tell ya!
Don't think it is a good idea!!!
Separation of the real rating and the displayed rating does not seem right to me You could just drop the initial rating to 1000 or 0 or whatever would solve the problem
The current system is much better.
For me when i started codeforces, I had almost no idea about cp and knowing just some basics of C I was in the same position of rapid rating loss, but I think those initial rating drops made me work harder and improve myself, had it been the new rating system I might not have given my best because I HAD NOTHING TO LOSE, reason for that may be: even if I perform bad my rating will increase initially.
I don't mean disregard the new method by any means :D
Can anyone tell me why we new users are not assigned 0 rating? So if he solve a single question he will get motivated.
Apparently "/ratings?order=BY_RATING_CHANGE_DESC" isn't interesting anymore.
Great initiative though.
[deleted manually]
why there is no strict action against plagiarism.
Good idea.
For (not that applicable) comparison, AOE2:DE uses several ranked ladders such that if you have good rating in one, it's easy to get good rating in another in a few games, and it requires you to play 10 games before you're visibly ranked — before that, only you can see your hypothetical rating.
FYI, I finally made a decent write-up of my contest rating algorithm (at least the first 8 pages of it): https://github.com/EbTech/EloR/blob/master/paper/EloR.pdf (2021 update: https://arxiv.org/abs/2101.00400)
Here's how everyone on Codeforces ranks according to it, including the performance and rating change of their most recent participation. Max ratings are in parentheses: https://raw.githubusercontent.com/EbTech/EloR/master/data/codeforces/CFratings.txt
The displayed rating is $$$1500 - 2(\sigma - 100)$$$, where $$$\sigma$$$ starts at $$$350$$$ and approaches $$$100$$$ as members become more experienced. So, the displaying rating is initially $$$1000$$$. That's a "penalty" of $$$500$$$, which shrinks to about $$$207.9$$$, $$$120.3$$$, $$$76.6$$$, $$$50.9$$$, $$$34.6$$$, $$$23.8$$$, $$$16.5$$$, $$$11.5$$$, $$$8.1$$$, and $$$5.7$$$ after competing ten times.
My system is a rigorous generalization of Elo/Glicko to the case of large numbers of simultaneous participants. It reduces inflation (but doesn't eliminate it), reduces the influence of new members, limits the size of rating jumps for experienced contestants, and produces (I believe) a more reasonable distribution of ratings (see the table here for a comparison). The math does all this naturally, without special hacks.
As a bonus, I recently optimized the implementation so that the entire history of Codeforces can be simulated on my laptop in 25 minutes. I haven't run many experiments or detailed comparisons, but I'd gladly assist anyone who wants to try!
I made some random checks and I think your rating is inflating as bad as current CF one. Have you tried to model CF competitions and make computational experiments?
P.S. Oh sorry, I read like every second paragraph. Start with the following: assume every user score like Gauss(x, x) distributed score for some x. Every round is participated by half of users. How many rounds will you need to stabilize the rating? I think it's like 50 or 100 for current CF with 1000 users.
I will translate for you some notices about the origin of inflation that were made in Russian.
The current CF system might have less inflation due to recent anti-inflationary measures, which I considered to be somewhat of a hack. Compared to the CF system without these measures, Elo-R seems to have less inflation based on looking at the rise of top-rated members over the years... but I admit this is not a rigorous experiment.
I suppose it would be helpful to create a testbed with various rating algorithms, so that we can easily compare them along each of the criteria you mention. I can't spend too much time on this project while working alone, but I might slowly try some of these things. It's open-source if others want to chime in with their own experiments. And since my system has a theoretical framework, we can change its assumptions to better match reality if needed.
Thanks for the discussion and experiment ideas!
For a very simple experiment, I just tried running my algorithm with an initial $$$\sigma$$$ of $$$250$$$ instead of $$$350$$$. This changed the mean true rating of all Codeforces users from $$$1311$$$ (when $$$\sigma=350$$$) to $$$1415$$$ (when $$$\sigma=250$$$). That's a big difference! Most of this seems to be due to the weakest members not having converged, since they participate less. Nonetheless, the top ratings also increased, by 14 to 19 points.
I suspect that two things are happening:
When new users join, $$$1500$$$ is usually an overestimate of their skill, so they lose more than the system expects, giving everyone else extra rating points. This is an inflationary pressure.
As users compete more, they usually improve and become better than their rating, so they win more than the system expects, taking away rating points from others. This is a deflationary pressure.
By using a high starting $$$\sigma$$$, we can drastically reduce the first effect, altering the balance between inflation and deflation.
Edit: with $$$\sigma=100$$$ (where it would stay fixed), we get a mean Codeforces rating of $$$1511$$$. The top ratings... are actually less than they were with $$$\sigma=350$$$! I suspect the population hasn't converged yet. A high starting $$$\sigma$$$ accelerates convergence of the population rating distribution by allowing it to spread right away, from Round #1. With low $$$\sigma$$$, my system might become even slower than others since it puts less weight on surprising performances (outliers).
What is mean rating? As far as I understand (I haven't read your paper yet, sorry), generalisation of Elo to the case of large numbers of simultaneous participants should be still zero-sum, i.e. maintaining constant mean rating. And if I remember and understand it correctly, the thing you are reinventing is TrueSkill (apart from the displayed rating feature, which is nice, but not so important now).
And, as some kind of a bonus, TrueSkill could be generalised to the case of team competitions (many thanks to S. Nikolenko for pointing it out to me), which is sometimes our case (and I do not see how you do that).
I didn't find an implementation of this rating system and didn't tested it on CF history, but I think that I'll try to do it next week if it would be interesting.
Maybe I should write an explicit comparison with TrueSkill, after studying the Gaussian density filtering algorithm more closely to understand its properties. The models are similar, but I'll point out differences:
My model uses logistic distributions instead of Gaussians: a major theme of my paper is to look at the implications of this.
TrueSkill has a model for draws, which I think is inaccurate for programming contests. Elo-R doesn't model draws, but instead treats them as half a win + half a loss, similar to Elo. Actually, if we model draws as nearly-equal performances, the Elo-R math gives a result equivalent to 1 win + 1 loss, but it seemed strange to me to give draws double the weight...
I appeal to the limit of large numbers of contestants to allow for an efficient and easy computation of performance. As a result, I get formulas that are more Elo-like and human-interpretable. Many properties of Elo-R can be deduced just by looking at the formulas, even without the theoretical model! (The formulas give reasonable results for small numbers of contestants too, although the theory no longer justifies this.)
Mean rating is the average "true rating" among all users, including inactive ones.
My system is most directly inspired by Glicko. It can be made more Elo-like by keeping $$$\sigma$$$ constant, as happens to be the case in my $$$\sigma=100$$$ experiment. In that case, the mean rating stayed very close to $$$1500$$$, drifting $$$11$$$ points off due to the approximations involved. I believe a system which preserves the mean will be inflationary: as Codeforces becomes more popular, it's likely to welcome a lot of weaker members, so we should want the average rating to decrease. Perhaps the top should be allowed to increase very slowly, as the state of the art improves.
I believe Elo-R can be extended to team matches. Thinking about it very briefly, we could compute performance scores for each team, by modeling team performance as a sum or max of random draws for each member (which amounts to one random draw per team, with a different distribution). Then, knowing the team performance, we have to reserve-engineer a measurement of each team member's skill.
In short: I spent last $$$x$$$ hours studying the improved variant of TrueSkill and think that it is exactly what CF needs.
It is not the complete list of reasons why I think so, but at least the most important ones.
TBH, I don't see why logistic distributions are better than Gaussians. IMO, the main reasoning behind the Gaussians in all Elo-like models is CLT and I can't find any similar argument for logistic distributions.
The model for draws in TrueSkill is perfect for programming contests! Consider players' points as a rough measure of his performance and get exactly this model of ties. The case of multiway ties (which is one of problems of the original TrueSkill) was fixed in its generalised version from the second link. And I can't say that on CF ties are not very common.
Rating inflation. What we exactly need from the model is the correspondence between rating number and percentile (like, if your rating is at least $$$x$$$, then you are in top $$$y\%$$$), which is exactly what TrueSkill does and what is seems to be impossible in your model (correct me, if I am wrong here, I think so because of non-constant mean rating). Of course, any model of this kind should have some rating inflation, but we don't need exactly ten LGMs, we want to have number of LGMs proportional to the total number of players. Moreover, when we say that current rating system has problems with rating inflation, we generally mean that average div.1 user rating is growing too much because new users first contests influence the rating a lot more than it should be. But in theory, if we don't have a multiple account problem (which the visible rating feature should take care of), any model with non-constant $$$\sigma$$$ should do the trick.
Surely, we need to test and compare our models to discuss it further, so I'll try to implement all this stuff, and post my results next week.
Cool. It's definitely worth doing more tests! The improved TrueSkill from St. Petersburg looks like a good alternative. Nonetheless, I want to reply to your latest points:
What exactly is your CLT-based argument? Are skills and performances the sum of very small contributions? Top players seem more spread-out than a normal distribution would predict, and all players seem to have more days that are very good or very bad, compared to normal draws. The logistic causes my model to put less weight on these unusual performances. We could try to measure the true distribution.
The problem with ties is that there is no fixed $$$\epsilon$$$ that works. The most obvious example are the massive ties at $$$0$$$ points, whereas performances drawn from logistic or normal distributions would have no such bias. Maybe that's ok in practice: without human-interpretable formulas, it's hard for me to guess, just as it was hard to foresee problems with the original TrueSkill model of ties.
In Elo-like systems, rating differences are based on win probabilities, not percentiles. As far as I can tell, TrueSkill doesn't enforce percentiles either: it merely suggests them in the prior, just as Elo-R does.
Fixing the average rating reduces our control over the inflationary and deflationary forces I mentioned. But now I have a question. It seems to me that fixing the average rating is inconsistent with non-constant $$$\sigma$$$, because users with large $$$\sigma$$$ would gain/lose more points than their opponent loses/gains. Are you saying that TrueSkill manage to have both, and how?
Yes, something like that. A least on top level. Your skill is like a sum of problems you know how to solve and your performance is roughly the sum of solved problems minus the sum of mistakes you made. I think that good and bad days happen because of the way we competing. You have almost no choice between problems, so it is pure luck to get problems you would like. In order to handle this effect you should use some recommender system as a rating system, which is definitely an overkill for our case. But anyway, I think it is not good idea to change something in rating system just in order to handle additional CF-specific noise.
If you want to be very precise, then you can choose $$$\varepsilon$$$ as a function of score, but I think that for practical purposes $$$\varepsilon = 0.736$$$ should be fine (and maybe the case of $$$0$$$ score should be handled separately)
True, I meant percentiles in the Bayesian sense, not where you are in the actual rating. For all Elo-based rating systems you could compute your expected percentile based on your rating (like, if your rating is 1500, then you are in top $$$\frac12$$$). It could be different from the percentile in the actual rating, e.g. about 60% of ranked lichess (uses Glicko-2) users have their blitz rating lower than 1500.
Forget about this part, I messed up. Mean rating should be constant if we compute ratings with proper weights, I believe that it is also true for your model.
We need to make this the new rating system
I spent some time learning Rust and implementing "TrueSkill from SPb" and now I have a couple of questions about your rating system.
I spent some time reading your code and noticed that you have only strict formulas, no approximations or loops until convergence. It makes me think what you do is not a Bayesian inference rather something different. I think that because the posterior distribution should not be expressible in terms of logistic distributions and therefore should be approximated somehow (just as TrueSkill does so). So, if I am right, then why does your rating system converges fast enough? And if I am not right, how did you do this?
Also, if I understood it correctly, you don't add additional noise to players performance as it usually done. Instead, you have a lower bound on players sigma. Why are these approaches equivalent?
Meanwhile, I found some bugs in pseudocode from article, so the implementation would take more time (algorithm is still correct, I just need to get some formulas from scratch). I think that I will finish and post first results next week. Also I want to use your CF-parser if you don't mind.
Sure, use whatever you need!
I derive the formulas in the paper: in short, I cheated in a few places to make the formulas tractable. As a result, I have human-interpretable formulas that approximately follow the Bayesian theory. In other words, my system can be interpreted in two ways: the Bayesian way, or as an exponentially weighted average of recent performances with reduced weight on outliers.
I do add noise. The "lower bound" you're seeing is simply the limiting value of $$$\sigma$$$ if the noise and measurement error per contest are constant. Actually, the noising calculation is where I had the toughest time sticking to the Bayesian derivation. In the end, I came up with four methods, each with different pros and cons which I'd be happy to discuss further.
What is the complexity of your code? My code processes all contests from the set in your repo less than in two minutes (if cached already) and I haven't optimized it yet. TBH, it gives some shit instead of results, but I know where is the problem and I think that it doesn't affect the performance.
Wow 2 minutes! I'd like to see. If each contestant recalls $$$M$$$ rounds of history, then a round with $$$N$$$ contestants should take $$$O(N^2\log\frac{1}{\epsilon} + NM\log^2\frac{1}{\epsilon})$$$ using binary searches, with the log factors reduced if Newton's method is used instead.
The $$$NM$$$ term only applies to certain versions of my "noising step", but the $$$N^2$$$ term seems common to all the programming contest algorithms that compute pairwise win probabilities.
Well, I can't prove that, but it seems like my code processes a round with $$$N$$$ contestants in time $$$\mathcal{O}(N \log \frac{N}{\varepsilon})$$$. It is not a really fair bound since each multiway tie and each team now work in quadratic time, but it is not so important when running on CF history and it definitely could be optimized to linear time.
I don't want to post code that does not work, but if you don't want to wait until I fix everything and make a post, you can find it on my github (username is the same).
Cool, it looks like TrueSkill doesn't need to compute the $$$N^2$$$ pairwise win probabilities! You can probably make it even faster using Rayon to parallelize as I did.
Today I reverted my code to use my simplest noising method. While the fancier methods are interesting for a variety of reasons, the simple one is more straightforwardly comparable with the current system on Codeforces.
Basically, it approximates the posterior, which is the most intractable part, by a Gaussian, while still using logistic distributions in other places. This method has no $$$NM$$$ term nor a saved history, so it satisfies the same monotonicity properties as Codeforces. It also results in the simplest versions of my formulas (see page 8 of the paper as of today). The rating distribution might converge more slowly with this method, but it seems to end up in the same place.
TrueSkill computes win probabilities only for pairs of players on neighbouring places (up to the multiway tie case).
Btw, I finished the algorithm and now it works in less than a minute, but somehow strange. I can't find the problem (it is either parameters or bugs), so I am curious how you tested your rating system.
Also, I am refactoring my code and since I am fairly new to Rust, I'd be very grateful for any suggestions to reduce the boilerplate and improve readability.
Hey I saw your new blog post, great work! Sorry that I haven't had much chance to examine your code or engage in the discussion. I moved on July 1 and the internet company still hasn't set up WiFi where I live :( so I'll be on and off the grid for just a little while longer...
The idea sounds very interesting as rating graph will be increasing for new participants which will boost their morale at the beginning instead of initial pressure of starting with 1500 straight away. It will discourage the idea of multiple accounts too.
MikeMirzayanov, about first 1000 people with maximum $$$\Delta$$$ are new accounts(in $$$\texttt{RATING CHANGES}$$$ tab), which is annoying. Can you please exclude accounts with less than 6 rated contests in $$$\texttt{RATING CHANGES}$$$ tab?
Or even just remove this "new system" which just hides the real rating like some third-rate computer game would do
I understand lowering default rating from 1500 because a lot of people are below it, but not this
My bad I started late! starting from zero :(
Not at all.
What will be the effect in my ratings if I participate in a contest but couldn't submit even a single solution???
If you were unable to submit anything or you submitted but failed on sample tests. Your rating will be unaffected/participation won't be considered.
I would like to ask why my rating started from 366.
It started from zero, it reached 366 after you gave your first contest.
Yeah, but why some people start from 1200, or 1400.
And I don't know why someone is downvoting me for asking question.
Since this is ratism
Older accounts' initial rating is 1500 though it isn't mentioned, after they give their first contest, if they do better than 1500, their rating would increase from there. Otherwise it would come down to 1400 or 1200 like you mentioned. If it's a new account and started with a rating of 1200 or 1400, they must have done extremely well in their first contest, but still i don't think getting 1200+ in the first contest with the new accounts is possible tbh.
Salam & Greetings!
I am a newcomer at codeforces. I am very glad to register here and learn something about programming with a big community. I need some advice from experts here...
How to grow up my rating at codeforces?
How to build up problem-solving skills day by day?
How to solve or think of a problem to solve as a beginner?
How to grow up programming knowledge and data structure and algorithm knowledge?
How to keep up long term motivation for competitive programming or problem-solving?
Just Go to PROBLEMSET Section and sort the questions according to increasing order of difficulty, and first grind 10-15 problems to get familiar here, rest u will understand with time gradually.
It would be nice to have only positive rating changes in codeforces, so people doesn't bother to stop coding if they are so much rating phobic.
What are you high on?