How to expand the class of problems in CP without losing objectivity + simplifying organization

#	User	Rating
1	tourist	3985
2	jiangly	3814
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3517
7	Radewoosh	3410
8	hos.lyric	3399
9	ecnerwala	3392
9	Um_nik	3392

#	User	Contrib.
1	cry	169
2	maomao90	162
2	Um_nik	162
4	atcoder_official	161
5	djm03178	158
6	-is-this-fft-	157
7	adamant	155
8	awoo	154
8	Dominater069	154
10	luogu_official	150

Who am I? I took up ACM ICPC in 2006 as a student, went to the semi-finals 5 times, trained the teams of a provincial university, held several camps for schoolchildren from the zero level and above, and participated in the organization of regional competitions many times (ContestSfedu aka Championship of the South of the Russia aka GP of Azov Sea — Opencup). In recent years, I have been pursuing a career, so I do not actively participate in the sports program, but from time to time I continue to help in holding competitions in Taganrog.

Why am I speaking out? More and more often I find myself thinking that competition programming does not keep up with the rapidly changing world and it becomes more difficult to find arguments for students to do it. It's probably easier with schoolchildren, but it's only a matter of time. Do those who organize ICPC, school Olympiads, etc. understand this problem? — I don't know, because I'm not actively involved in the activities of the community. It is likely that everything that I will say next has already been discussed many times here or in the corridors of IFMO, and may already be implemented on some platform unknown to me (I did not conduct research). And suddenly not?

** What problems do I see in the competetion programming **:

Problems of polynomial complexity are finite. Every year it becomes more difficult to come up with original problems, especially for competitions with a low level of participants. Sportsprog is becoming more and more like chess in the sense that, due to the finiteness of tasks, in the foreseeable future, AI will cope with it better than humans, and in human competitions, not knowledge will come to the fore, but the ability to isolate meaning from a confusing text of conditions and apply standard knowledge (and even libraries) in a stressful situation in a limited time. Online is at risk of suffering from the dominance of cheaters. The outside world (employers, sponsors, etc.) will treat the tasks being solved every year with greater skepticism, since in real life the degree of uncertainty is an order of magnitude higher. I will add that due to the desire of the jury to accept solutions in different languages, optimizations in the spirit of “speed up by 2 times” or “remove the logarithm” are practically meaningless in the sports program today, despite the fact that in life they have a noticeable economic effect.
It is very expensive to do tasks of high quality. Preparing a task, even if it is streamlined, takes a huge amount of time, and therefore money. First of all, this concerns test coverage, because for the organizer there is nothing worse than accept a deliberately wrong or non-optimal decision. It becomes more and more difficult to hold local offline competitions, because it is impossible to find resources to prepare original problems, and the old ones kill objectivity and lead to worse experience than in high-quality online contests. Due to the lack of local competitions, diversity is reduced, social bonds between participants are not formed, and participants, up to reaching the top level, cannot really try themselves as authors without being hated for insufficiently high-quality tasks.
All attempts to expand the class of problems run into the problem of objective evaluation. The jury cannot rate one heuristic above another, because there is a set of tests on which the result will be the opposite, and the participants do not know the tests and their distribution. The only successful case of expanding the classical set of problems was the emergence of interactive problems, but this class is also very limited in size and imposes the same requirements on the completeness of the solution and tests.

What will be great to achieve:

Contests should be interesting. Applying standard algorithms over and over again is not very interesting, so it is necessary to increase the influence of intuition and the creative aspect. Sponsors should be interested in the fact that the problems meet the requirements of today, and ideally they can easily bring their real problem to the contest and get new solutions and tests using the collective mind. Spectators should be interested, because the contest lasts for a reasonable time, and the result is not entirely predictable until the very end, ideally if it can also be visualized.
There should be a lot of contests and their holding should not require large expenses for preparation. Ideally, the formula should allow any provincial university or school to prepare an original competition in a week or two with the help of teachers or students of not the highest level.
Contests must be objective. The influence of the jury on the result should be minimized. Participants must be on an equal footing and compete primarily with each other. The random factor should be as small as possible.

What I suggest:

Remove all restrictions on tasks solved at contests. NP-complete problems, problems with imprecise solutions, complex games, etc. — everything can be used as tasks, because this is a huge amount of new knowledge and skills that are no less useful for participants than those currently used in the CP.
Do not require the jury to prepare complete sets of tests, ideally correct or various incorrect solutions for problems in a bunch of languages. Ideally, there should be enough statement, samples to understand the format and the simplest pretests to cut off deliberately crooked solutions (rather to check compliance with the format than to check correctness).
Participants during the contest can upload the solution and upload the generator (or interactor). Decision required, generator — optional. When a new solution appears, it is tested on all tests generated by the participants. The result of the execution is compared according to some criterion with other participants, as a result of which the points are redistributed. When a new generator appears from a participant, his old tests are canceled along with the points received for them, and all solutions are tested on new tests, which leads to a new redistribution of points.

That's all :) In my opinion, it is precisely such a simple scheme that can work, just as the ACM ICPC scheme once worked, where a plus sign is given regardless of the complexity of the task.

How to count results?

The most similar scheme to ACM ICPC — it is to give all tasks the same price in points and distribute them in percentage among the participants. For good measure, let's assume that 10 participants solved the problem (that is, their solutions passed the jury's pretests to cut off the complete trash and save system resources), and only 5 of them filled in their generators, each of which created 20 tests. Then we have 100 tests in the system, on each of which we will run all 10 solutions, i.e. each test yields 1 percent. Suppose on one of the tests the participant found the solution better than everyone else, then he will receive 1% for it. And on another test, all 10 participants found the same solution — they will get 0.1% for it. If two participants found a solution better than the other 8, then they will receive 0.5% each. Well, etc. Obviously, the sum of points will always equal 100%. At the same time, due to the presence of both large and small tests, weak participants will be able to claim points until the very end of the competition. Various degenerate cases like "one participant passed the pretests and didn't flood the generator" also do not interfere with summing up — we give him 100% for the task.

Now a few details, why I think that with such a scheme of conduct, you can objectively calculate the result for any task:

To begin with, we apply this scheme to existing ACM problems that have an exact solution. Obviously, the participants will be distributed in the same way as they are distributed in school contests, except for cases when some mistakes are not covered by any tests. Well… is it part of the tactic, is it to waste time on the generator in an attempt to take points from rivals or to solve other problems? But now it will not be possible to accuse the jury of incomplete coverage of tests, and each team will have a person with developed testing skills (this format will be the most interesting in team competition).
Let's apply the scheme to some NP-complete problem or a problem without an exact solution. Obviously, the output of the system will be a bunch of tests for different heuristics, and the solutions will largely be random. But it will no longer be possible to accuse the jury of preferring some kind of heuristic, because you yourself could write a generator that will break your rivals. Will this bunch of heuristics have any value? Certainly! Indeed, in sum, they form a program that covers them all, and in a reasonable execution time. The higher the level of the contest participants, the better both the solutions and the tests will be.
Let's apply the scheme to some game, for example, checkers. On the one hand, there will be a decision (interactor) with moves for white, on the other hand, a decision of the opponent with moves for black. Obviously, according to the results of the launch of each with each, a completely objective table of results will be obtained, regardless of whether the competition is held among younger students or among top participants. And although in neither case the result will be comparable with specialized programs, it does not interfere with the contest, and for preparation you only need to describe the rules of the game, the data format and write a checker to check the correctness of the move.

A few small nuances that will complement the overall picture:

A solution run multiple times on a fixed set of tests (as well as a generator) should produce the same result, i.e. the participant must use a fixed seed when randomizing. Otherwise, the results become unpredictable, and therefore biased.
It is necessary to force the participants to create in the generators not only large tests, but also small ones. The easiest way is to require the generator to create all tests at once, and in the condition to describe the criteria for them. For example, to require that some of them be small.
So that the scoreboard can be displayed online, and the participants do not overwhelm the checking system with endless testing, the number or time between sending generators should be limited, at least until the monitor is frozen. Obviously, after freezing, you can test selectively (for example, only the last attempts).
Online contests must use room splitting. Otherwise, the number of tests will be too large. The final result can be calculated already among the top participants from the rooms. For offline events, the generation of 10-20 tests by each will not be a big problem, because the number of tasks per contest will also be reduced due to the need to write generators.
It is necessary to think about how to take into account the time in the parcels and whether it should be taken into account at all. Obviously, under such rules, participants cannot completely stop working on a task at any point in time, because at any moment tests may appear on which their solution will lose effectiveness, or an idea will appear how to break other people's solutions. Most likely, a huge number of tactics will appear, how to waste the time of the contest, which will definitely increase the entertainment.

Show or not show tests to participants, whether to assign some bonuses for the "first solution" or "good generator", with what accuracy to compare the result, whether to take into account time and memory, whether to limit multithreading, whether to require sorts to be stuffed into one file or to allow already execute anything in an isolated container — All these are minor nuances that do not change the overall picture.

What is required to get started?

The Codeforces needs to be improved, and given the existing infrastructure of hacks, educational rounds, a testing ground, etc. it doesn't look very big. It is quite possible that the existing API already allows such a contest to be held and everything will rest only on the visualization of the results.

In general, if the idea is good — like, participate in the discussion — maybe it will be of interest to other organizers. Without mass application, it will not be of interest, therefore I am writing here. If the idea is old — share a link to the implementation, especially interested in the opportunity to hold your own contests, trainings, etc. It's clear that there are hundreds of contests in the world with inaccurate solutions, but I'm talking about reforms in the classic competition program, where there are long-established traditions.

Rev.	By	When	Δ	Comment
en4	Seryi	2023-04-25 15:00:49	0	(published)
ru5	Seryi	2023-04-25 14:53:34	0	(опубликовано)
en3	Seryi	2023-04-25 14:53:20	4
en2	Seryi	2023-04-25 14:52:46	17
ru4	Seryi	2023-04-25 14:47:14	1	Мелкая правка: 'получат 0.01% за него' -> 'получат 0.1% за него'
en1	Seryi	2023-04-25 14:46:29	13038	Initial revision for English translation (saved to drafts)
ru3	Seryi	2023-04-25 14:26:21	2952	Мелкая правка: ' позволяетт провести' -> ' позволяет провести'
ru2	Seryi	2023-04-25 14:16:11	448
ru1	Seryi	2023-04-25 14:09:29	11484	Первая редакция (сохранено в черновиках)

History