Блог пользователя EbTech

Автор EbTech, история, 2 года назад, По-английски

Readers of my Codeforces blog may recall that inutard and I have been experimenting with rating systems for quite some time. Last year, we demonstrated how to break the Topcoder system, and published our findings at the World Wide Web 2021 research conference. There, we derived a new Bayesian rating system from first principles. This system was specifically motivated by sport programming, though it may in theory be applied to any sport that ranks lots of contestants. Recently, it was adopted by the Canadian contest judge, DMOJ.

Now, in collaboration with the CodeChef admins, we are pleased to announce that upon the completion of July Lunchtime, the Elo-MMR rating system goes live on CodeChef.com! While recognizing that any rating system migration will be disruptive, we hope that you’ll find the advantages worthwhile. Let’s briefly go over them. Elo-MMR is:

  1. Principled: using rigorous, peer-reviewed, Bayesian derivations.

  2. Incentive-compatible: it’s never beneficial to score fewer points.

  3. Robust: players will never lose too much rating for a single bad day.

  4. Open: the algorithm and its implementation are available under open licenses.

  5. Fast: on a modern PC, CodeChef’s entire history is processed in under 30 minutes; using close approximations, we’ve further reduced it to under 30 seconds. In addition, Elo-MMR offers:

  6. A better rating distribution: CodeChef and Codeforces have a high spread at the top: for instance, tourist / gennady.korotkevich has a 1000+ point lead over CodeChef’s other top users, whereas Elo-MMR brings the gap below 150 points. Conversely, the old systems provide less spread in the low-to-median percentiles, compared to the wider numerical range that Elo-MMR allocates for the majority of users to progress through.

  7. Conservative ratings / newcomer boost: Similarly to Microsoft’s TrueSkill, the publicly displayed rating will be a high-confidence lower bound on the player’s actual skill. As a result, rather than starting at the median and moving down, newcomers will start near the bottom and move up as the system becomes more confident in their skills. This provides a sense of progression and achievement.

  8. Faster convergence for experts: Compared to the previous rating system, Elo-MMR awards a much higher increase to users who do well in their first CodeChef rounds.

It’s worth noting that no special hacks were taken to ensure these properties: they emerge naturally from the mathematical derivation. For details, please see the latest revision of our research article.

To demonstrate the difference in rating distributions, let’s plot them for CodeChef users, using both the old and new (Elo-MMR) rating systems. The latter’s parameters were chosen in such a way as to make the two rating scales approximately comparable. We see that although the original CodeChef ratings take up a wider range overall, gennady.korotkevich alone occupies a big chunk at the upper end! Elo-MMR, on the other hand, spreads the bottom 80% of the population over a wider range.

Plot

So, how are we handling the migration to Elo-MMR ratings? One option we considered is to simply take the current ratings and apply Elo-MMR updates from now on. This is not ideal, since rating systems can take a long time to converge: for example, consider the rating distribution of any programming contest site in its first year of operation. At the opposite extreme, we can retroactively recompute all contests in CodeChef’s history using Elo-MMR. DMOJ took this approach; it better leverages the site’s history, but the resulting transition is very sharp.

To smooth it out, CodeChef retroactively computed everyone’s current Elo-MMR ratings in the backend, but will continue to show the original CodeChef rating. Every time you compete, your backend Elo-MMR (MM) rating will be updated, and then your public CodeChef (CC) rating will be pulled closer to your MM rating according to the formula:

$$$\mathrm{CC}_\mathrm{after} = \mathrm{MM}_\mathrm{after} + 0.75 (1 - \frac 1n) (\mathrm{CC}_\mathrm{before} - \mathrm{MM}_\mathrm{before}),$$$

where $$$n$$$ is the number of rounds in which you’ve taken part. Thus, new players, who do not yet have a solidified CC rating, are transitioned into the new system more rapidly. The most experienced players are pulled one-fourth of the way each time they compete.

Having examined the population, we might also ask how much the ratings of specific individuals change between the two systems. To find out, we first eliminate the noisiest data: players who have been caught cheating, and newcomers with at most 10 lifetime contest participations. For all remaining users, we compare their original CodeChef rating to their retroactively computed Elo-MMR ratings. The statistics are summarized as follows:

  • World champion gennady.korotkevich is the main outlier, with $$$CC - MM = 1079$$$. Thus, over the next 10 or so contests that it will take for the new ratings to come into effect, he is expected to lose about $$$1000$$$ points. Thanks Gennady, for understanding and being fine with this!

  • Less than 800 additional users have $$$CC - MM > 200$$$, and none of these are over $$$500$$$. Nonetheless, these users’ ratings will most likely decrease over their next few contests.

  • Around 8000 users have $$$MM - CC > 200$$$, the highest of them being $$$630$$$. These players will experience a rating boost over their next few contests.

  • All other users have $$$|MM - CC| \le 200$$$, so they will experience minimal disruptions.

For the next few months, the day after every rated contest, we will update everyone's hidden Elo-MMR rating in this drive (warning: it might be too large to view online). This way, interested users can investigate their "true" rating trajectories. Within a year, CodeChef will retire the migration plan and simply assign all users their full Elo-MMR rating.

We welcome any questions and will do our best to answer them all!

  • Проголосовать: нравится
  • +188
  • Проголосовать: не нравится

»
2 года назад, # |
  Проголосовать: нравится +6 Проголосовать: не нравится

Auto comment: topic has been updated by EbTech (previous revision, new revision, compare).

»
2 года назад, # |
  Проголосовать: нравится +17 Проголосовать: не нравится
»
2 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Do you know of any more platforms that are/going to implement this in the future?

  • »
    »
    2 года назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    A couple of smaller organizations (outside of sport programming) expressed an intent to do so, but let's wait and see!

»
2 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

epic but please first remove codechef cheaters...

»
2 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

We see in these distribution charts kind of bumps at the beginning of color sections, similar on atcoder. This is, there are usually more users a bit over a color change border than below.

I understand that these bumps are social phaenomenia. Is there any kind of mathematical tool to handle these, like an automatic decrease over time while not partitipating?

  • »
    »
    2 года назад, # ^ |
    Rev. 4   Проголосовать: нравится +6 Проголосовать: не нравится

    We'd have to study the bumps more carefully before making an informed decision. For example, the bumps at 900 and 1000 might simply be due to the starting rating. CodeChef used to give players a starting rating of 1500, but in February changed it to 1000. In the Elo-MMR chart, we started newcomers at $$$\mu=1600,\ \sigma=280$$$, but changed this to $$$\mu=1500$$$ on the same date as CodeChef's change. Since $$$\sigma$$$ is set to asymptotically converge to $$$80$$$, this means a player's starting rating after applying a 3-sigma penalty is one of:

    $$$1600 - 3\cdot(280 - 80) = 1000,$$$
    $$$1500 - 3\cdot(280 - 80) = 900.$$$

    The Glicko rating system implemented your idea of a time-based decay by gradually increasing $\sigma$. This technique extends naturally to Elo-MMR if you want, though we're not doing it here.

»
2 года назад, # |
  Проголосовать: нравится +6 Проголосовать: не нравится

<<< Robust: players will never lose too much rating for a single bad day.
Are you sure this makes sense? It's not going to be so much fun for me because of this.

  • »
    »
    2 года назад, # ^ |
      Проголосовать: нравится +31 Проголосовать: не нравится

    Its going to be so much fun for me because of this :)

  • »
    »
    2 года назад, # ^ |
      Проголосовать: нравится +2 Проголосовать: не нравится

    I think the same. Robust also means you don't get much if you score high. I managed to get rank 6/2318 in Div2 (best contest I ever had) and the system increased my MMR-rating only by 67.

    • »
      »
      »
      2 года назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      The system has memory so, if you perform at that level a second time in the near future, you'll see a similar or maybe even bigger increase.

»
2 года назад, # |
Rev. 2   Проголосовать: нравится +14 Проголосовать: не нравится

How will you fix my rating if I don't participate? Pretty much the case of all div1 participants except a few hundred.

Spoiler
  • »
    »
    2 года назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    That would complicate the user interface. For the majority of users, the new rating will be slightly higher, so it's worthwhile to compete to get your new rating. If the admins wish to do so, it's still possible to instantly converge the ratings of inactive users after a year.

    • »
      »
      »
      2 года назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      I think we shouldn't show rating transition deltas on the current graph. Negative delta after good performance or positive delta after bad doesn't make sense on the graph.

»
2 года назад, # |
  Проголосовать: нравится +26 Проголосовать: не нравится

In your perfect world, how would organizations that run different rating systems migrate their system to Elo-MMR?

It's not clear to me that having a smoothing factor is desirable, especially if it's understood that the old ratings might not make sense for various reasons. This is why DMOJ decided to just rewrite all history to use the new ratings — it was the simplest way to perform the migration even if it was the most disruptive in terms of user-visible impact. Would other organizations conclude that they need some sort of smoothing factor now?

  • »
    »
    2 года назад, # ^ |
    Rev. 2   Проголосовать: нравится +18 Проголосовать: не нравится

    What DMOJ did seems fine to me. We sketched a fancier migration plan which might be optimal in terms of user experience, but is a bit annoying to maintain! The idea is as to keep two databases in the backend:

    1. The user-visible ratings which are initialized as-is, but with some $$$\sigma$$$ value so that they can be interpreted as Elo-MMR ratings.

    2. The hidden ratings which are retroactively computed with Elo-MMR.

    From then on, both databases will receive Elo-MMR updates simultaneously. The trick is that your new performance computations in both databases 1 and 2 will be based on only database 2 ratings. Since ratings are basically weighted averages of past performances, a player's two ratings will converge to each other after enough rounds.

    Unlike the simpler smoothing, this method ensures that your rating goes up (down) when you perform above (below) your rating. On the other hand, you now have two full-fledged Elo-MMR databases that interact.

»
2 года назад, # |
  Проголосовать: нравится +11 Проголосовать: не нравится

Just saw this being used at the recent codechef contest, and I hope no other organization adapts this

»
2 года назад, # |
  Проголосовать: нравится +27 Проголосовать: не нравится

Many people show negative reaction to this change, but I like this change because I got $$$+ \Delta$$$ from this change there are no more too huge s**tty $$$- \Delta$$$. Look at me. There are much less falls with new graph. It seems the graph shows more real skill change (for at least 7*). Great job!

  • »
    »
    2 года назад, # ^ |
      Проголосовать: нравится +8 Проголосовать: не нравится

    That's what I highlighted in my twitter thread as well.
    Even tho I lost a few internet points. My graph is just constant for the last 3 years
    New accounts or people who stopped participating after hitting one point of X rating / Y stars have drastic changes in their rating.

    Anyways at present, it's the best platform for reds. They cannot lose ratings even if they deliberately want to.

  • »
    »
    2 года назад, # ^ |
      Проголосовать: нравится +26 Проголосовать: не нравится

    Well what about this profile:

    Look at the rating graph

    https://www.codechef.com/users/cthau

    Becoming Red by giving div4 and div3 contests!

    Pretty sure there are many loopholes in this current rating system.

    • »
      »
      »
      2 года назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      Maybe recent too many division system is unfriendly with the current rating system. Can we bound the rating change for div2 or lower or merge some divisions?

      • »
        »
        »
        »
        2 года назад, # ^ |
        Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

        I believe that would be the best option: Merging some divisions. Also some upper bound should be introduced for every division. Similar to what codeforces does.

        The introduction of new rating system + too many divisions would lead to a rise of new alt accounts.

        • »
          »
          »
          »
          »
          2 года назад, # ^ |
            Проголосовать: нравится +2 Проголосовать: не нравится

          What about the rating boost which is present there initially. Many will create new account to get higher ratings in very first few contests.

          They must remove or decrease the rating boost. I don't know what to call it but i hope you understand

  • »
    »
    2 года назад, # ^ |
      Проголосовать: нравится +6 Проголосовать: не нравится

    Well, you do realize that besides removing huge -ve delta, they also removed huge +ve deltas too? And well, you benefited from it, good for you. But there are people who have been inactive for years and just woke up from the dead to realize that the rating threshold they worked so hard to get before they quit CP due to real life demands just got all their hard work washed away due to some sadass admin(s) deciding that the system they had for years suddenly need to be replaced? And they did not even make any provision to accommodate inactive people AT ALL? Yes, the old graphs remain, but the rating adjustment is shown even there, making it a sad ugly graph, especially for people who have been inactive for multiple years.

    Codechef has stopped organizing their regular contests which they were able to do fairly regularly for almost a decade before unacademy stepped in. No contests, no rating, editorials are paid, forums are dead, wth man. I have no idea whose decision it was, unacademy's or some narcissistic admins, but whoever you are, you killed codechef man. You killed our dear chef, fuck off.

    • »
      »
      »
      2 года назад, # ^ |
        Проголосовать: нравится +12 Проголосовать: не нравится

      TBH why we need to protect past participants? It's natural thing to penalize inactive participants because we can't guarantee their current skill(though the penalize seems not intended). The current rating should show the current skill.
      And why we need unsuitable huge $$$+\Delta$$$? Though it's happy to get them, but huge skill improvement should be needed before them.

      By the way, I can agree these days of CC is bad. Without regular Cook-off or LT (of course 7* rated!), the rating is really worthless.

      • »
        »
        »
        »
        2 года назад, # ^ |
        Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

        What about those who started competing last year or so and stagnated around 1700 to 1900 range for months, never stopped. Its going to be really difficult to hit 5 stars since it takes in to account past performances

        Rank ~40: 1916 -> 1928 (40+ contests participated)

        Rank 210: 1916 -> 1954 (<10 contests participated)

        such stuff is possible with the new system which does not make sense

        • »
          »
          »
          »
          »
          2 года назад, # ^ |
            Проголосовать: нравится 0 Проголосовать: не нравится

          It's not the problem of current rating, but problem of all rating systems with hysteresis.
          For example, in atcoder, if the result of first contest is 4000perf in AGC, the participants' rating will be 2800 (though currently newcomer cannot participate AGC), but random blue or yellow cannot reach to red with one time of 4000perf.
          CF ratings determined from only old rating so this kind of problem isn't occurred BTW current codeforces rating contains too huge swing, but old and new CC rating (and most of other ratings?) contains this problem.

      • »
        »
        »
        »
        2 года назад, # ^ |
          Проголосовать: нравится 0 Проголосовать: не нравится

        Well, why not? They proved themselves during the time they were active, why the need to re-prove their skill again? I'm talking about cases where they were say 5, 6, or 7 star rated and suddenly after years, they got reduced to 2, sometimes even 3 levels. This isn't really fair IMO. I understand such a huge change affects everyone, some positively some negatively, but CC could've at least kept the old graphs untouched? Or maybe kept some other provision to view the old profile entirely? I don't really know. Look at these cases for example: 1, 2. I'm pretty sure there are multiple such cases.

        Also, in case you thought that the problem of huge deltas was solved, look here. Got red from div-3 contest, wtf?!

        Overall, I believe (and I am just an individual, I don't really know how the community feels about it), that this rating migration was a failed experiment. Didn't do any good to the site, just further buried it in its grave.

        Overall, CC is, pardon my french, fucked. Hopefully it isn't beyond redemption, I spent a significant part of my teenage years on the site. I'd love to see it come back to life again, but that doesn't seem quite probable looking at the way things are going. :(

        • »
          »
          »
          »
          »
          2 года назад, # ^ |
          Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

          why the need to re-prove their skill again?

          Simple reason. Because people gets smarter and smarter. Let's do some old VC at CF. The ranklist with you shows better results than you expected.

          Got red with div3 is a problem. The simplest solution is cap the rating change at lower division.

          • »
            »
            »
            »
            »
            »
            2 года назад, # ^ |
              Проголосовать: нравится +1 Проголосовать: не нравится

            Uh, I'm not exactly sure by what you mean by "people get smarter and smarter". If you mean to say that skill level of a person consistently improves with time, I'm sorry but I don't think you're correct. Human intelligence definitely peaks after a certain point, and so does skill level in some physical/mental sport. Someone who's say middle-aged now, definitely won't be able to perform with the same skill that they did when they were in their teens. And people peak after a certain point of time. This rating system does punish people who left at their peak, and I'm afraid it does so rather too harshly. If you meant something else by that, would you please elaborate?

            • »
              »
              »
              »
              »
              »
              »
              2 года назад, # ^ |
                Проголосовать: нравится 0 Проголосовать: не нравится

              I mean getting rating $$$X$$$ now is harder than getting rating $$$X$$$ some years ago (without rating inflation). So, comparing current ratings and past ratings are originally nonsense.

              • »
                »
                »
                »
                »
                »
                »
                »
                2 года назад, # ^ |
                  Проголосовать: нравится +1 Проголосовать: не нравится

                Uh, I'm not sure that's true either. Do you have any proof for this? In fact, I think it's quite opposite — getting x rating now is easier than getting x rating few years ago, but I may be wrong too.

          • »
            »
            »
            »
            »
            »
            2 года назад, # ^ |
            Rev. 2   Проголосовать: нравится +15 Проголосовать: не нравится

            you don't get it do you? Let's say I stopped competitive programming, because I lost interest or some other reason. Obviously, if I return my rating will be worse because I haven't done it for a while, but the fact that I left shouldn't ruin all my old achievements I was able to get when I was still active. It's like taking a championship title from a player, because they've retired and there are better players now. It doesn't make sense.

»
2 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Why did you swap lyrically for ksun48 in the graphs?

»
2 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Can someone explain how a 7 star participant be eligible and get positive delta from div3 and div2 contests for example this user https://www.codechef.com/users/paupau. It doesn’t make sense that if someone has become 6 star according to the new rating system then div2 and div3 contests still contributing to the ratings further.

»
2 года назад, # |
  Проголосовать: нравится +28 Проголосовать: не нравится

Hey all, I should point out that we're seeing not only differences between the two ratings systems, but also differences in the settings of their parameters.

A high starting $$$\sigma$$$ parameter is what enables new players' ratings to change so fast. Topcoder does this too. Usually these changes are manageable; the edge case of users reaching ridiculous ratings after only competing in Div 3, occurred because their division placement was based on a different system (the old one). In other words, the user competed in Div 3 when really, according to the new system, they should already be in Div 1. Now that the migration is done, this will never happen again.

Another change is that the update weight parameter was decreased, because the admins want ratings to be a more reliable indicator of long-term performance. If you feel that your recent hot streak has been erased, don't fret. Elo-MMR will magnify your next few wins in support of your streak. This momentum effect allows stability to coexist with faster updates when a true jump in skill occurs. For example, you can see ksun48's's new ratings accelerate in the last half of 2020, converging almost as fast as the old ratings. (With a higher update weight, Elo-MMR can converge even faster.)