While I very much enjoy Codeforces rounds overall, FSTs are by far my least favorite part of the Codeforces contest format. There are two reasons I dislike contests with weak pretests. The first is stylistic: I don't care very much about penalizing people who don't rigorously prove their solutions before submitting (which is the main justification for having weak pretests) and I think it's essentially impossible to FST these solutions without more frequently FSTing contestants who make minor mistakes in their code. (The vast majority of my FSTs are due to this sort of minor glitch rather than an error in the underlying idea).
The second is that FSTs introduce an element of hidden information to Codeforces rounds. During a contest, you have no way of knowing whether the authors have written strong pretests. If you act as though the pretests are weak, you will be at a disadvantage if pretests are actually strong (because if you knew pretests are strong, you could have e.g. submitted ideas that are difficult to prove but easy to code, submitted code without spending time checking every detail to make sure you haven't missed any silly details like int vs long long, etc). If you act as though the pretests are strong, you will be at a disadvantage if pretests turn out to be weak (because you didn't thoroughly check your solutions before submitting them). When pretests are weak, there is also ambiguity over which errors pretests will catch, and it's not obvious to me why some mistakes deserve to be penalized more than others by FSTing. Thus, the current system (FSTs are usually strong, but sometimes pretests are unintentionally weak and occasionally an author will write weak pretests on purpose) introduces a degree of variance that I believe both unnecessarily decreases the competitive value of Codeforces rounds.
In this post, I offer four proposals to improve the state of pretests on Codeforces. The first proposal serves to eliminate the hidden information problem described above. The remaining three focus on minimizing FSTs for authors who seek to write strong pretests. (These three proposals are generally commonly implemented already, and my purpose in posting this blog is to try to make them universal norms.) All of these proposals are supported by concrete examples of problems that I believe they would have improved.
Proposal One
If an author intentionally writes weak pretests, they should announce this decision in the contest announcement or in the statements of the affected problems.
Case study: Round 869, Div. 1 E. In this problem, the author chose to include only six pretests in order to discourage people from incorrectly guessing the solution idea and attempting proofs by AC. The issue is that (a) many minor implementation bugs seem to have been caught in the crossfire and (b) contestants had no way of knowing that pretests would be weak (only one large test was included, so unlike most problems around this difficulty level, typical implementation errors common in large test cases did not fail pretests).
I disagree with the stylistic choice to use weak pretests here, but I think it's a matter of judgment and I respect the author's right to decide to make pretests weak. However, to eliminate the hidden information problem, contestants should be told that pretests are weak in advance, especially because most contestants have come to assume that pretests for late problems in Div. 1 are strong.
I can think of a couple of ways to implement this, but my favorite is to add a note at the end of the problem statement reading "Note that the pretests for this problem are intentionally made weak. There are six pretests, including the sample test case." This lets contestants know that they are responsible for carefully confirming the validity of their solutions before submitting their code. This isn't a full solution to the hidden information problem, as there's still ambiguity over which specific errors the pretests are designed to catch, so I would oppose weak pretests even if this change was implemented, but I think this would significantly reduce the damage weak pretests do to the contest experience.
Sidenote: This proposal, including the language of the note above, was inspired by maroonrk's comment at this link.
The remaining proposals are specific to authors who intend to write strong pretests. (I think this applies to the strong majority of CF setters.)
Proposal Two
If a problem is expected to receive under 100 solutions, the pretests should be the same as the system tests. No exceptions.
Case study: Round 896, Div. 1 D. In this problem, there were 41 pretests, which I assume were meant to be strong (for example, I accidentally declared my variable storing the sum of the input degrees as an int and not a long long and this error was caught by pretests, even though the only way it should matter is due to undefined behavior or if the sum of the input degrees is not equal to $$$2N$$$ but is evaluated as $$$2N$$$ due to overflow). However, several people FSTed when it turned out that there were 37 additional system tests not included in pretests. Given that this problem received only about 25 solutions in-contest, running the full test suite on every submission would not have created a significant load for the contest servers, so these FSTs could easily be avoided by setting pretests equal to systests.
More generally, running a large number of test cases is not especially expensive when a problem receives very few solutions throughout the contest, so unless pretests are intentionally made weak, there is no reason for them to not coincide with the system tests.
Proposal Three
If pretests do not include all system tests, then multitesting should be used. If possible, pretests should include all possible small cases. Exceptions can be made for e.g. Div. 2 As if with only 2-3 pretests allowed, it is impossible to include all small cases, any larger edge cases, and larger maximal cases (though this is fairly rare because most modern D2As have small inputs and large numbers of test cases).
I'm ambivalent on whether exceptions should be allowed for e.g. problems where multitesting makes the complexity analysis more complex. However, any exceptions should require justifications approved by the contest coordinator (and if possible, these problems should just have pretests = systests).
Case study: CodeTON Round 5 D. In this problem, I FSTed due to a minor bug in my logic that would have been caught on a fair number of small cases (if I recall correctly, my solution fails most cases where there exists a zero-length edge incident to $$$N$$$). There wasn't any meaningful reason the problem didn't use multitesting, and incorporating multitests would have prevented this and similar FSTs.
Multitesting is a very powerful tool for preventing FSTs, and authors should use it (and take full advantage by including all small cases) unless they have a good reason not to.
Proposal Four
Assuming multiple meaningfully distinct test case generators are used, at least one test from each generator should be included in pretests. Authors should prepare solutions meant to fail on each category of test case and should ensure that each solution actually does fail pretests.
Case study: Global Round 19 E. In this problem, the answer is immediately NO whenever $$$n$$$ is not a Fibonacci number. No such cases are included in pretests, however, and my solution FSTed by not immediately returning after outputting NO. There was a generator in place to output non-Fibonacci numbers, but no such test was included in pretests (I think this was the result of last-minute communication issues during the preparation of the round).
I'm pretty sure that following this proposal is standard practice already. Aside from rounds where pretests were intentionally weak, I can't recall any authors intentionally not following this practice. The point of this proposal is not to substantially change the process of test case generation but to make sure that this condition is checked prior to the start of each round.
Thanks for reading! If you have feedback, please feel free to write a comment; I'd be happy to hear input from people with more problem preparation experience than me.