Are problem ratings busted? - Codeforces

→ Pay attention

Before contest
CodeTON Round 9 (Div. 1 + Div. 2, Rated, Prizes!)
43:32:25
Register now »

*has extra registration

→ Top rated

#	User	Rating
1	tourist	4009
2	jiangly	3823
3	Benq	3738
4	Radewoosh	3633
5	jqdai0815	3620
6	orzdevinwang	3529
7	ecnerwala	3446
8	Um_nik	3396
9	ksun48	3390
10	gamegame	3386

Countries | Cities | Organizations

→ Top contributors

#	User	Contrib.
1	cry	167
2	Um_nik	163
3	maomao90	162
3	atcoder_official	162
5	adamant	159
6	-is-this-fft-	158
7	awoo	157
8	TheScrasse	154
9	Dominater069	153
9	nor	153

View all →

→ Find user

→ Recent actions

Detailed →

neal's blog

Are problem ratings busted?

By neal, 4 years ago, In English

In English

Let's look at two problems from the last round, Round 657 (Div. 2):

Problem D 1379D - New Passenger Trams currently has 367 solvers and had 77 solvers in the official contest (among rated participants).

Problem E 1379E - Inverse Genealogy currently has 44 solvers and had 0 solvers in the official contest.

Meanwhile both problems have the same difficulty rating of 2400. How does that make any sense?

Tags

problem, ratings

+186

neal
4 years ago
23

Comments

Comments (21)

Show archived | Write comment?

»

4 years ago, # |

Vote: I like it

+61

Vote: I do not like it

On top of that, problem F1 1379F1 - Chess Strikes Back (easy version) currently has 97 solvers and had 10 solvers in the official contest, which is still clearly easier than problem E. But it has a difficulty rating of 2700.

→ Reply

»

4 years ago, # |

Vote: I like it

+55

Vote: I do not like it

Another problem that is certainly not worthy of 2900 rating, 1372E — Omkar and Last Floor

→ Reply

»

4 years ago, # |

← Rev. 8 →

Vote: I like it

-47

Vote: I do not like it

I also have examples for the opposite case. problems https://mirror.codeforces.com/contest/1385/problem/E and https://mirror.codeforces.com/contest/1385/problem/F are 2000 and 2300 which doesn't make sense. E is a small variation to a classic problem, and F is a little hard but really not that hard. Should be more like 1700-1800 and 2000-2100.

Edit:

Wow I'm sorry if I offended anybody, I didn't mean to diminish anyone's achievement!

→ Reply

»

4 years ago, # |

Vote: I like it

+18

Vote: I do not like it

Good news, 1397D and E are now 2300 and 2800 per your suggestion.

→ Reply

»

4 years ago, # |

Vote: I like it

+70

Vote: I do not like it

Thanks. Rarely some heuristics work not good. In this case, I need to change ratings manually.

→ Reply

»

»

4 years ago, # ^ |

Vote: I like it

+8

Vote: I do not like it

Could you also fix this one https://mirror.codeforces.com/problemset/problem/86/D

Its rating is 2900, and like... That should not be true right ?

→ Reply

»

»

»

4 years ago, # ^ |

Vote: I like it

+9

Vote: I do not like it

Yes , of course it should be around 2000-2100

→ Reply

»

»

»

»

4 years ago, # ^ |

Vote: I like it

0

Vote: I do not like it

Imho this explains very well why 2700+ rating is correct for 86D.

→ Reply

»

»

»

»

»

4 years ago, # ^ |

Vote: I like it

0

Vote: I do not like it

But bro , in current situtaion i dont think that it is a non standard thing .

→ Reply

»

»

»

»

»

»

4 years ago, # ^ |

← Rev. 2 →

Vote: I like it

0

Vote: I do not like it

In that case, you should ask mike to take into account upsolved submissions as well. Rating of people when they upsolved it. click click2

Problem rating is correct as far as the spirit of rating formula is concerned and it shouldn't be changed just for the sake of it.

→ Reply

»

»

4 years ago, # ^ |

Vote: I like it

+39

Vote: I do not like it

Can you explain what the heuristics are, or what the overall system is? I used to think problem difficulties were calculated by a single formula with a simple interpretation.

Thanks!

→ Reply

»

»

»

4 years ago, # ^ |

Vote: I like it

0

Vote: I do not like it

I guess the mystery is somewhere here, in UPD2 "coefficients".

→ Reply

»

»

»

»

4 years ago, # ^ |

← Rev. 2 →

Vote: I like it

0

Vote: I do not like it

Yeah, the original blog announcing problem difficulties (https://mirror.codeforces.com/blog/entry/62865) said they are calibrated so that if your rating is $$$R$$$ and the problem rating is $$$r$$$, your probability of solving it during an official round is $$$f(R - r)$$$ for some function $$$f$$$ with $$$f(0) = 0.5$$$, similar to Elo and similar ratings for two players.

But I don't really understand where they come from, like... is it just based on fitting to the ratings of the participants during the official contest? And if so, is it pre-contest ratings, post-contest ratings, or per-contest "performance ratings"?

Plus, as you said, it seems from Mike's comments like there are probably some ad-hoc heuristics/hacks on top of this basic formula, but we don't know what they are.

→ Reply

»

»

»

4 years ago, # ^ |

Vote: I like it

+54

Vote: I do not like it

I can't easily explain all the details. In the perfect world problem rating is such a rating of opponent that your probability to win him equals to the probability to solve the problem. But in the real world, the data is dirty: consider tourist tried div3 but A is too boring for him. So statistically he didn't solve it and it will give a great boost to the problem rating. I tried to count only official submissions, but for example, for hard div3 problems official submissions give less information than unofficial. So my current way to calculate problem ratings full of some weights, coefficients and heuristics. You can try yourself using API, but I don't think there is a silver bullet to calculate ratings much better. I think now in 98% ratings are quite good, and rest ratings can be tuned manually.

→ Reply

»

»

»

»

4 years ago, # ^ |

Vote: I like it

+16

Vote: I do not like it

Thanks! I was mostly curious for my own understanding, rather than trying to suggest I could come up with a better system. It would be nice to know the underlying system that originated these ratings while looking through the problemset.

Is it common for highly-rated participants to skip easy problems in low divisions? Anecdotally, when I look at Div3 results, I see GMs and IGMs at the top of the rankings, usually having done the problems in order. I guess I don't see the GMs/IGMs who do the problems out of order though, since they aren't at the top of the rankings. :P

I was also curious if the "rating" of a participant in a contest is considered as their pre-contest, post-contest, or per-contest-performance rating?

→ Reply

»

»

»

»

4 years ago, # ^ |

Vote: I like it

+11

Vote: I do not like it

Can you just open source the formula (Just like you did with rating formula)? And allow other interested people to do a PhD.

→ Reply

»

»

4 years ago, # ^ |

Vote: I like it

+3

Vote: I do not like it

1384B2 - Koa and the Beach (Hard Version) had 307, 1384D - GameGame had 144 and 1384B1 - Koa and the Beach (Easy Version) had 846 solves in contest-time in Div2. But Currently they have 2200, 1900 and 1900 difficulties. Please fix them.

→ Reply

»

4 years ago, # |

← Rev. 2 →

Vote: I like it

+22

Vote: I do not like it

What about these: 1183E(Easy version) being rated at 2000 while 1183H(Hard version) being rated at 1900?

→ Reply

»

4 years ago, # |

Vote: I like it

+59

Vote: I do not like it

not ratism

→ Reply

»

»

4 years ago, # ^ |

Vote: I like it

+24

Vote: I do not like it

I think there's a big difference between "unrated" and "rated", moreso than "rated low" and "rated high".

Unrated is typically either people who

don't participate in contests, so they may not really have enough experience with Codeforces to meaningfully talk about contest ratings or logistics.
people who make a second account to post things. If even the author doesn't think their post is good enough to want it associated with their primary account, how likely is it that the post is actually good?

→ Reply

»

»

»

4 years ago, # ^ |

Vote: I like it

0

Vote: I do not like it

Also, probably the new account was made so that whatever it posts cannot be traced to the original one, for example, when it was used to post "Reveal how xxx cheats" stuff.

→ Reply