In Div 4 contests on Codeforces, why is the system testing phase and queue so long?

#	User	Rating
1	tourist	3985
2	jiangly	3814
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3517
7	Radewoosh	3410
8	hos.lyric	3399
9	ecnerwala	3392
9	Um_nik	3392

#	User	Contrib.
1	cry	169
2	maomao90	162
2	Um_nik	162
4	atcoder_official	161
5	djm03178	158
6	-is-this-fft-	157
7	adamant	155
8	awoo	154
8	Dominater069	154
10	luogu_official	150

As you guys may have noticed, in a Div 4 contest on Codeforces, the system testing phase and queue usually takes like forever (a submission can get stuck in the queue for up to 30 minutes, and the system testing phase sometimes last literally like half a day). So why don't problem setters just... not do that?

What I mean is, why do author deliberately made both the time limit and the constraint so large? I'm actually really confused, since Div 4 is well known for its slow queue, so at some point someone should've thought about enforcing guidelines regarding setting the appropriate constraint for optimal judging speed in "crowded" contests, such as Div 3 and Div 4.

To really understand what I meant, let's take a look at this problem: 1980C — Sofia and the Lost Operations. This problem has a time limit of 2 seconds, as well as the maximum sum of $$$n, m$$$ over all testcase of $$$2 * 10^5$$$. You see, there's actually no reason to not half both the time limit and the constraint, since it's not like we are going to let $$$O(n * log (n))$$$ algorithm pass while failing $$$O(n ^ {1.5})$$$, $$$O(n ^ {1.75})$$$ ones. When you do that, inherently, nothing changed about the problem, and there are numerous merits in return, such as faster queue, faster judging time, happier participants, etc. So... what is stopping authors from setting the constraint to {$$$1$$$ second, $$$n \leq 10^5$$$}, or even as low as {$$$0.5$$$ second, $$$n \leq 5*10^4$$$}. I respect problem setters decision to set the constraint of their problem to a particular number, but I simply think there are things that can be done to make every party happy.

Again, this doesn't mean that every problem's constraint should be lowered. For example, this problem: 1985H2 — Maximize the Largest Component (Hard) works best with the current constraint, as going any lower would means a well implemented, simple brute-force with the complexity of $$$O(n * m * min(n,m))$$$ would be near indistinguishable to a intended, but badly implemented solution.

Comments (17)

Write comment?

Vladosiya

6 months ago, # |

+70

You are forgetting that easy problems must be solvable with some languages except C++ even without fast input/output and still not $$$O(n^2)$$$-able in C++. Input of $$$10^5$$$ integers is still huge and can make slower languages have execution time more than 1sec and/or let C++ pass with $$$O(n^2)$$$.

The number of tests affects queue more and we minimizing it in first problems, queue on first minutes just inevitable.

As for long system testing, it just caused by tons of submissions multiplied by tons of different hacks which breaks the same mistakes.

→ Reply

MegalovaniaJ

6 months ago, # ^ |

-49

Can you set different time limits for different languages? Just like, 1s for C/C++, 2s for Python/Java, etc.

rmz

Thats not fair, anyone can learn c++ its not like a secret or prohibited language

PeruvianCartel

How does that work though? Most of the problem on Codeforces has the intended solution that runs in $$$O(n)$$$ or $$$O(n * log(n))$$$, so I thought halving both the constraint and the time limit won't change the problem? Intuitively, reading $$$2 * 10^5$$$ numbers should be... $$$2$$$ times longer than reading $$$10^5$$$ numbers, regardless of what kind of programming language was the program written on, right? Or does slower language takes a substantial amount of time just to execute a blank program, thus halving both the problem constraint and the time limit isn't going to cut it?

Wasif_Shahzad

this happened with me. I din't have fast I/O and sofia and lost operations took 1.5s to get ac during contest and 1.8s after. so lower constraints would've killed me when it was nothing too special

adamant

+39

Hacks pretty much seem to be the issue. I mean, Div. 3 C has 156 tests after all hacks are added? This is quite a large number for a problem with 9294 solves, given that official guidelines suggest to have like at most 15 cases for such problems. As it goes, adding 130 different hacks that exploit the same thing in a different way doesn't help at all.

← Rev. 3 →

Now I'm thinking about it, yeah. I think ways to fix this are either shorten the hacking phase (I don't think this would really work), or just make stronger pretest. For example: in this problem 1985F — Final Boss, authors unfortunately didn't consider the binary search solution, so they didn't include testcases which cause overflow in the pretest. As a result, there are a lot of hacks in the contest (I remember seeing one guy with +100 hacks, I'm not really sure), which means either the authors have to hand pick which hack test to include, or let all the tests in, which results in an absurdly long system testing phase. Unordered map and string hash hacking is also a common theme in div 3, div 4 contest as well, so problem setters could consider including some anti-unordered_map, anti-common_hash test in the pretest. Of course, the decision whether to include strong pretest is on the author, but I think it would be more beneficial to avoid huge number of submission clogging the system test in problems where thousands of "passed pretest" submissions are expected, rather than punishing beginners on implementation errors/not proving their solution.

dzhi

← Rev. 2 →

I am not an author. I do noticed system slowness in div 4 contest. I am not sure the exact reason but think larger number of participants and more submissions (due to easier problems) are the main reasons. For example:

the most recent div 4 contest has about 46k participants and O(100000) accepted submissions;
the most recent div 2 contest has 36k participants and O(50000) accepted submissions;

So the system has double traffic in div 4 contest than in div 2 contest, which I think is the main reason of the relative slowness.

It is unclear to me by how much reducing time limit like in 1980C from 2 to 1 second would help, as many submissions take less than 1 second already and 2 second is not a big number. Also reducing the time limit to bare minimal as indicated by the optimal solution is not cost free:

the relative overhead due to I/O or language will be amplified;
the tolerance for alternative sub-optimal solution become narrowed;
more discouragement for beginners in div 4 contest.

The real optimization, as in my opinion, is to improve the system capacity and performance, so it can deliver similar or better response time under double traffic, and prepare for increased traffic in the future due to CF become more popular.

123gjweq2

+36

O(100000) is constant time so that shouldn't take too long

O(100000) stands for Order of 100000 accepted submission.

gotcha ty

That's not what I meant in the blog. If cutting both the problem's constraint and time limit in half, say instead of setting {$$$2$$$ seconds, $$$n \leq 2*10^5$$$}, why don't authors just set them to {$$$1$$$ second, $$$n \leq 10^5$$$}, assuming it doesn't affect the difficulty of the problem?

I got that. I mentioned a few other involved overheads or factors, including I/O (Input and Output) and initialization delay. For example, even if n is only 10, the system may take O(300ms) to evaluate the submission regardless, and the overhead may vary a lot across different languages. We already know that the I/O delay matters, sometime much more in one language than the other, which explains all kind of tricks to make the I/O faster, at the cost of low level code with poor readability. The desire to set small/tight time limit, may unnecessarily create unfair hardships for specific set of languages. For example, I encountered problem with such a tight limit that it is impossible to be Accepted in Java, with the exact algorithm and similar code.

Oh. So in slower languages, even initializing the program takes couples of hundreds of miliseconds already, so halving the time limit wouldn't be exactly fair.

djm03178

Even now there are a lot of $$$\mathcal{O}(n^2)$$$ solutions with tiny constants that run in 2 seconds when $$$n = 2 \cdot 10^5$$$, so halving both $$$n$$$ and TL would let even more of such solutions pass. For example, when we have string s of length $$$n$$$ and then spamming s.erase(s.begin()) will easily pass in many problems. Using a TL like 0.5 seconds though, becomes significantly unfriendly to slower languages with high starting overhead.

I think though, some problems are just too generous in TL, like any inefficient solution with intended time complexity in the slowest languages won't exceed 300 ms. Maybe they can just decrease TL to 1 second.

Errichto

since it's not like we are going to let $$$O(n∗log(n))$$$ algorithm pass while failing $$$O(n^{1.5})$$$

These two might be very close for $$$n$$$ around $$$10^5$$$. The main reason for bigger $$$n$$$ is indeed failing $$$O(n^{1.5})$$$ or very fast $$$O(n^2)$$$. The constant factor is extremely important. For example, the following code takes around 3s and it's impossible to distinguish from an intended solution in $$$O(n \cdot \log^2 n)$$$:

n^2 for n=10^5

#include <bits/stdc++.h>
using namespace std;
 int main() {
     int n = 100000;
     vector<int> a(n);
     for (int i = 0; i < n; i++) a[i] = rand();
     long long sum = 0;
     for (int i = 0; i < n; i++) {
         for (int j = i + 1; j < n; j++) {
             sum += a[i] ^ a[j];
         }
     }
     cout << sum << endl;
 }

What others pointed out is that CF must spend some extra time to initialize solutions in some languages. The time limit could actually be 100ms in many problems but CF might still need more time to run a test.

I believe that div3-4 rounds are supposed to be easy to prepare. An author doesn't need to be that experienced and doesn't need to worry much about the strength of tests. I guess that there's weaker oversight from Mike and experienced div1 coordinators. In div1+2, they would tell the author to limit the number of tests in easy problems, and make those tests strong at the same time. This might even lead to swapping a problem. The same should be applied to div3 and div4.

I agree with adamant that hacks are to blame for the long wait after the contest.

PeruvianCartel's blog