Comments on World Finals Baku's Problemset Distribution

#	User	Rating
1	Benq	3792
2	VivaciousAubergine	3647
3	Kevin114514	3611
4	jiangly	3583
5	strapple	3515
6	tourist	3470
7	Radewoosh	3415
8	Um_nik	3376
9	maroonrk	3361
10	XVIII	3345

#	User	Contrib.
1	Qingyu	162
2	adamant	148
3	Um_nik	146
4	Dominater069	143
5	errorgorn	141
6	cry	138
7	Proof_by_QED	136
8	YuukiS	135
9	chromate00	134
10	soullless	133

UPD: I modify the title to a more moderate version. I once again would like to emphasize that I am only questioning the medal distribution should be better and many things can be done to prevent a close cut-off line this year. I am sorry for whoever feel offended. Let’s keep this as a discussion and exchanging of views.

Now that the results are released, let's congratulate all teams that made it to the World Finals, and especially those who have won medals and the trophy.

Disclaimer 1: I respect all efforts from judges, and this blog is purely based on my observations and is not intended to be personal. It is not easy to make a problemset and your work is greatly appreciated.

Disclaimer 2: I haven't read the problemset, which means I know nothing about the problems so far. Therefore, feel free to consider this as nonsense.

Disclaimer 3: I don't know anything about the pipeline in the World Finals' problem setting.

OK, I only want to emphasize one phenomenon from this contest: Rank 4 (Gold) has the same number of problem solves as Rank 17 (nothing). And what's more, they all solve almost the same subset of problems (very few solve B instead of E).

The last time I saw this, it was in last year's NAC (North America Championship): https://nac.icpc.global/scoreboard/2024/. FYI, Rank 7 has the same number of solves as everyone til Rank 21, with almost the same subset of problems.

What's wrong with this, then? In NAC, teams compete for approximately 15 slots to advance to the World Finals. As a result, penalties decided whether you can get a slot. Similarly, in this year's World Finals, the situation became worse! The penalty decided whether you can get a medal, and if you have a small enough penalty, you can get in top 4 for a gold medal!

Strong words here it is: Contestants' efforts deserve a better problemset, as in the sense of distribution.

I understand that penalties are part of the game, and in fact, it is really important. There were many times in history when the penalty decided the trophy. But I am afraid that this is the first (few?) time such a bad distribution of problem solves has ever occurred in World Finals history.

What does this mean? This means that this problemset fails to predict the distribution cut-off. Maybe they have done work to predict and try to make the contest balance. However, due to one of the following two reasons they didn't succeed:

Not enough problems to select in the pool.
Unaware of how participants' abilities are nowadays.

Either reason is worrying. As far as I know, most judges are not active in the community for a long time. This makes them hard to understand the "fashion" of problems nowadays, and to know competitive teams in each season. This is not a big issue in most years in the past. However, if we consider the contest a serious selection of the world's best 12 teams, then they should do better and learn from this year's experience. Maybe, a more serious investigation is needed. Maybe, more judges are needed to bring in fresh blood.

Or maybe, the problem setting pipeline should involve serious testing. In the Chinese ICPC regional setting, we will invite reliable teams to test the contest in advance to see if our predicted distribution is as expected. I once mentioned this to NAC judges; they told me that it is unlikely to happen here since there is a risk of leaking.

No matter what really happened, I do feel that this is a lesson to the ICPC World Finals. More efforts should be made to better separate teams, and some proper efforts are really appreciated. If any judge can see this, feel free to share your thoughts.

Comments (28)

Write comment?

chenjb

8 months ago, hide # |

Auto comment: topic has been updated by chenjb (previous revision, new revision, compare).

→ Reply

QwQcOrZ

Agree. I don't think anyone would be happy to see this happen — either as an audience or as a participant.

kostka

-51

Is this a hidden advert for the Universal Cup? ;)

8 months ago, hide # ^ |

+17

Well, a lot of contests come from China. And for two semifinals and finals, we did whatever we could and tried to give a good cut-off as well as a good experience for everyone from all tiers to compete/watch. We think we did a good job while having room to improve :)

Um_nik

-59

I'm sorry, these are the two contests that have better scoreboards in your opinion? Ucup problems are much better, but scoreboards?

← Rev. 2 →

Well, that's why we have room to improve, I suppose. My comment on these UCUP originated contests are mostly for the fact that we did something (testing, replacing problems, setting different layers, and verifying if it is as expected...) to try making it better. Whether it is satisfactory is controversial as always.

RanRankeainie

+66

Why are these two contests' scoreboards not good? Ucup final's team are all powerful participants. Problems should be hard enough. Also problems solved by rank i is not the subset of rank i-1 for many i.

gemini_test

-46

I feel so sorry for you. You deserved to win. Especially after qualifying for Peking (harder than winning ICPC) and having amazing contest results on UCUP/QOJ while training.

-75

ksun48, Radewoosh, and Petr did the mirror of ICPC 2025 and ended up solving 9 problems after 5 hours.

Given that 17 university teams solved at least 9 problems, and 3 teams solved more than a team of some of the best... ICPC 2025 was not a good contest. If you had the ICPC teams compete in Universal Cup Finals, there would the a very clear difference in skill.

+11

That is such a stupid thing to say, and also completely unrelated to the comment you replied to.

RDDCCD

+83

The initial downloaded interactor for I is incorrect(Although the one used for judging is correct). And also world final has a long delay in replying clarifications. It takes around 20 minutes after we send the clarification that the interactor may wrong.

gojira

Now that the results are released

Where mate, https://worldfinals.icpc.global/scoreboard/2025/ is still frozen

Can spy on it by watching the closing ceremony livestream

Damn, had to watch half the stream to maintain suspense :D

The live broadcast kept saying that DFL+IJABK should be enough for a medal, and HE for a better medal. I'm guessing they significantly overestimated problem H.

Misjudging the difficulty is usually happened. But if we can arrange testing with several teams/individuals from different levels, the information should be able to help improve the evaluation.

onufryw

FYI, the live broadcast is produced by analysts, who are a different team from judges (who prepare the problemset). The judges (correctly) believed that the four most difficult problems are BECG; we did not, I think, significantly overestimate H.

tfg

Our team solved H as our second solved problem around 40 minutes into the contest in the mirror. The screenshot of that stat being showed was kinda funny, I thought it was an edited image at first.

KiKoS

~~As a tester, I enjoyed the round~~

As a contestant, I wouldn't say that the contest was significantly different from the ones held in recent years. What was different though, is the distribution of teams skills. There are still few teams who can solve 10+ (congrats SpbSU!) problems from the set in 5 hours, but IMO there were exceptionally many teams who could solve 10+ in like 6 hours. Given less time, some of them scored 9 and some of them scored 8. And it doesn't seem possible to fairly split a group of 20-30 teams that have really similar high skill.

Comparing to the last year's set, I would say 8 problems in 2025 (40th place) is at least 7 problems in 2024 (23rd place) and 9 problems in 2025 (17th place) is at least 8 problems in 2024 (11th place). And this is optimistic estimate from me as I personally think 2024 wf had a bit easier problemset.

I would assume 50th WF will be more difficult to compensate for that.

Personally, we had a rather bad, but not horrible performance and ended up 77th which is twice as bad as our worst virtual result of 40th place.

jeroenodb

+29

I felt this world finals was different, and this was mainly due to the easy end of the problemset. It was definitely still not easy to implement your first 6-7 problems accurately and fast, but for the best teams they also were not huge obstacles. So I think in other world finals some teams would have gotten stuck in implementation hell on the easy problems, this year many teams made it out of the easy problems. But still they take a significant chunk of time to implement so this also hurts the separation at the top because there is less time left for the hard problems. So the fact that the problems were slightlu easier to implement, which on its own is not bad, made for a worse balance.

+10

I'd agree with that; but I'm willing to call it a feature, not a bug :) One of the things I really wanted to achieve with this year's set (and I think it worked) is to actually get the top teams to work on the "really hard" problems. And the winners actually solved G, which I think would've gone unsolved on a typical WF, and we also had pretty reasonable submissions for C.

And the fact we didn't really have implementation hell problems (well, maybe H a bit) makes the problemset nicer. I'm really not sure if, looking back, I'd have added, say, another E-class problem (which would've likely increased the differentiation, but stopped teams from actually attacking C/G).

I'm a judge at the WF.

I agree that the fact teams ranked 4-17 had the same number of problems is unfortunate. I don't consider it a disaster the original post seems to think it is. There's a lot of "balance" things going on in the contest, even in the "top end". I'm happy that the top team was decided by a problem, not by time; I'm happy that teams actually made serious attempts at both the very hard problems in the set. Would I prefer a slightly different result (e.g., one of the teams that currently had 9 problems getting C, and winning a gold this way; and maybe several fewer teams solving E)? Sure. But is this a "serious mistake that the judges need to learn from"? I don't think so. In the end, with around 12 problems, and all equally scored, there is going to be a noticeably chance that penalties do matter quite a lot.

I'll also make a few extra comments about "most judges not being active in the community". First, that's true. You're more than welcome to submit problems to the Finals, which is how you become a judge; and I'm writing this without sarcasm — seriously, I'd be super happy to see more problems from more active contestants; and then having these contestants influence the final set.

Second; with each finals we run a check where before the finals, when we know the final problemset, each judge attempts to predict the "percentage of teams to solve" for each problem. I haven't done formal analysis of the results, but I don't think that the judges who are more active contestants do noticeably better on the predictions. My claim is "predicting this is just hard"; I haven't seen people be super-successful at this either at Code Jam (which I ran for a few years) or at the WF.

Third, you call for "more efforts" to be made to separate teams. You call out two concrete proposals: "Bring in fresh blood" (I'm not sure that would help), and "pre-testing" (as you mentioned, that's unlikely due to security concerns, altough I agree I'd like to see that). So, I'm not sure what proper efforts you actually have in mind.

gawry

+28

What is the procedure for submitting a problem to the WF?

There's a call for problems, not out yet for 2026 (the call for problems for 2024 went out on October 30th, and the deadline was end of January). Reach out to problems@icpc.global if you're interested in getting the call for problems, or I can forward it to you.

Generally, you send a problem statement, a description of the solution and a coded up solution to problems@icpc.global; they should be PGP-encrypted (the key will be a part of the call for problems), and generally anonymized (that is, the files you send will be dropped into a repo, and the files themselves shouldn't contain information about who's the author).

Thanks! I wasn't able to find this information on the ICPC's website.

I find it somewhat surprising that you would invite chenjb to submit their problems to the WF while the call for problems isn't publicly available.

I think that if someone wants to submit problems, getting their hands on the call for problems is easy. I mean, just send a message to me :)

(I think that improving the procedure of the call for problems is one of the things that we could definitely improve; I see no reason for the call for problems not to be public)

ko_osaga

← Rev. 3 →

I think the rank 4-17 situation is not nice, and I did write about how increasing the problem quality would additionally benefit this situation.

Having said that, I also agree that this is not a disaster, and here's the hot take that I've been thinking about for a long time: I think these scoreboards, or "speedforces" in general, are underrated. For me, it feels like something that people should learn to like. I think it's an (uncomfortably) accurate measure of skill, and teams that are quick will generally excel on harder problems, to the extent that it could be more accurate than actually having a single hard problem. I would've disliked the scoreboard much worse if B was a deciding factor.

By the way, I'll be holding a Codeforces round soon, so anyone afraid of speedforces should take this as a warning :)

chenjb's blog