Блог пользователя neal

Автор neal, 4 года назад, По-английски

Let's look at two problems from the last round, Round 657 (Div. 2):

Problem D 1379D - New Passenger Trams currently has 367 solvers and had 77 solvers in the official contest (among rated participants).

Problem E 1379E - Inverse Genealogy currently has 44 solvers and had 0 solvers in the official contest.

Meanwhile both problems have the same difficulty rating of 2400. How does that make any sense?

  • Проголосовать: нравится
  • +186
  • Проголосовать: не нравится

»
4 года назад, # |
  Проголосовать: нравится +61 Проголосовать: не нравится

On top of that, problem F1 1379F1 - Chess Strikes Back (easy version) currently has 97 solvers and had 10 solvers in the official contest, which is still clearly easier than problem E. But it has a difficulty rating of 2700.

»
4 года назад, # |
  Проголосовать: нравится +55 Проголосовать: не нравится

Another problem that is certainly not worthy of 2900 rating, 1372E — Omkar and Last Floor

»
4 года назад, # |
Rev. 8   Проголосовать: нравится -47 Проголосовать: не нравится

I also have examples for the opposite case. problems https://mirror.codeforces.com/contest/1385/problem/E and https://mirror.codeforces.com/contest/1385/problem/F are 2000 and 2300 which doesn't make sense. E is a small variation to a classic problem, and F is a little hard but really not that hard. Should be more like 1700-1800 and 2000-2100.

Edit:

Wow I'm sorry if I offended anybody, I didn't mean to diminish anyone's achievement!

»
4 года назад, # |
  Проголосовать: нравится +18 Проголосовать: не нравится

Good news, 1397D and E are now 2300 and 2800 per your suggestion.

»
4 года назад, # |
  Проголосовать: нравится +70 Проголосовать: не нравится

Thanks. Rarely some heuristics work not good. In this case, I need to change ratings manually.

  • »
    »
    4 года назад, # ^ |
      Проголосовать: нравится +8 Проголосовать: не нравится

    Could you also fix this one https://mirror.codeforces.com/problemset/problem/86/D

    Its rating is 2900, and like... That should not be true right ?

    • »
      »
      »
      4 года назад, # ^ |
        Проголосовать: нравится +9 Проголосовать: не нравится

      Yes , of course it should be around 2000-2100

      • »
        »
        »
        »
        4 года назад, # ^ |
          Проголосовать: нравится 0 Проголосовать: не нравится

        Imho this explains very well why 2700+ rating is correct for 86D.

        • »
          »
          »
          »
          »
          4 года назад, # ^ |
            Проголосовать: нравится 0 Проголосовать: не нравится

          But bro , in current situtaion i dont think that it is a non standard thing .

          • »
            »
            »
            »
            »
            »
            4 года назад, # ^ |
            Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

            In that case, you should ask mike to take into account upsolved submissions as well. Rating of people when they upsolved it. click click2

            Problem rating is correct as far as the spirit of rating formula is concerned and it shouldn't be changed just for the sake of it.

  • »
    »
    4 года назад, # ^ |
      Проголосовать: нравится +39 Проголосовать: не нравится

    Can you explain what the heuristics are, or what the overall system is? I used to think problem difficulties were calculated by a single formula with a simple interpretation.

    Thanks!

    • »
      »
      »
      4 года назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      I guess the mystery is somewhere here, in UPD2 "coefficients".

      • »
        »
        »
        »
        4 года назад, # ^ |
        Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

        Yeah, the original blog announcing problem difficulties (https://mirror.codeforces.com/blog/entry/62865) said they are calibrated so that if your rating is $$$R$$$ and the problem rating is $$$r$$$, your probability of solving it during an official round is $$$f(R - r)$$$ for some function $$$f$$$ with $$$f(0) = 0.5$$$, similar to Elo and similar ratings for two players.

        But I don't really understand where they come from, like... is it just based on fitting to the ratings of the participants during the official contest? And if so, is it pre-contest ratings, post-contest ratings, or per-contest "performance ratings"?

        Plus, as you said, it seems from Mike's comments like there are probably some ad-hoc heuristics/hacks on top of this basic formula, but we don't know what they are.

    • »
      »
      »
      4 года назад, # ^ |
        Проголосовать: нравится +54 Проголосовать: не нравится

      I can't easily explain all the details. In the perfect world problem rating is such a rating of opponent that your probability to win him equals to the probability to solve the problem. But in the real world, the data is dirty: consider tourist tried div3 but A is too boring for him. So statistically he didn't solve it and it will give a great boost to the problem rating. I tried to count only official submissions, but for example, for hard div3 problems official submissions give less information than unofficial. So my current way to calculate problem ratings full of some weights, coefficients and heuristics. You can try yourself using API, but I don't think there is a silver bullet to calculate ratings much better. I think now in 98% ratings are quite good, and rest ratings can be tuned manually.

      • »
        »
        »
        »
        4 года назад, # ^ |
          Проголосовать: нравится +16 Проголосовать: не нравится

        Thanks! I was mostly curious for my own understanding, rather than trying to suggest I could come up with a better system. It would be nice to know the underlying system that originated these ratings while looking through the problemset.

        Is it common for highly-rated participants to skip easy problems in low divisions? Anecdotally, when I look at Div3 results, I see GMs and IGMs at the top of the rankings, usually having done the problems in order. I guess I don't see the GMs/IGMs who do the problems out of order though, since they aren't at the top of the rankings. :P

        I was also curious if the "rating" of a participant in a contest is considered as their pre-contest, post-contest, or per-contest-performance rating?

      • »
        »
        »
        »
        4 года назад, # ^ |
          Проголосовать: нравится +11 Проголосовать: не нравится

        Can you just open source the formula (Just like you did with rating formula)? And allow other interested people to do a PhD.

  • »
    »
    4 года назад, # ^ |
      Проголосовать: нравится +3 Проголосовать: не нравится

    1384B2 - Koa and the Beach (Hard Version) had 307, 1384D - GameGame had 144 and 1384B1 - Koa and the Beach (Easy Version) had 846 solves in contest-time in Div2. But Currently they have 2200, 1900 and 1900 difficulties. Please fix them.

»
4 года назад, # |
Rev. 2   Проголосовать: нравится +22 Проголосовать: не нравится

What about these: 1183E(Easy version) being rated at 2000 while 1183H(Hard version) being rated at 1900?

»
4 года назад, # |
  Проголосовать: нравится +59 Проголосовать: не нравится
  • »
    »
    4 года назад, # ^ |
      Проголосовать: нравится +24 Проголосовать: не нравится

    I think there's a big difference between "unrated" and "rated", moreso than "rated low" and "rated high".

    Unrated is typically either people who

    • don't participate in contests, so they may not really have enough experience with Codeforces to meaningfully talk about contest ratings or logistics.
    • people who make a second account to post things. If even the author doesn't think their post is good enough to want it associated with their primary account, how likely is it that the post is actually good?
    • »
      »
      »
      4 года назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      Also, probably the new account was made so that whatever it posts cannot be traced to the original one, for example, when it was used to post "Reveal how xxx cheats" stuff.