After making several submissions earlier today to 1463F - Max Correct Set that all involved calling clock()
, I think I've found some major inconsistencies in clock timing on Codeforces.
I had a solution that was correct but borderline on runtime because it needs to generate enough data to extrapolate from (via Berlekamp-Massey). So I made it iterate until 3.8 seconds, and looking through the submission list, it appears I wasn't the only one with this type of idea. Unfortunately I got WA on several submissions: 101579005 101581695 101583625. The strange thing was that each test case that resulted in WA didn't actually run for the full 3.8s, but only about half of that. When I took the same code and ran it in custom invocation with the same input, it used the full 3.8 seconds and got the right answer each time.
So what's going on here? To try to figure this out, I set up a simpler version of some timed code since the submissions above are quite complicated. Here's what I've got now. This uses multiple different methods of timing, including clock()
, wall time, and std::chrono::steady_clock
. It also calculates a sum via a loop as kind of a proxy for "how much work" is actually being done.
#include <chrono>
#include <iostream>
#include <sys/time.h>
using namespace std;
long double wall_time() {
timeval tv;
gettimeofday(&tv, nullptr);
return tv.tv_sec + tv.tv_usec * 1e-6L;
}
const double TIME_CUTOFF = 1.0;
const int ITERATIONS = 1e6;
int main() {
long double clock_begin = clock();
long double wall_begin = wall_time();
auto chrono_begin = chrono::steady_clock::now();
int64_t sum = 0;
while ((clock() - clock_begin) / CLOCKS_PER_SEC < TIME_CUTOFF) {
for (int iter = 0; iter < ITERATIONS; iter++)
sum += iter % 3;
}
cout << "clock: " << (clock() - clock_begin) / CLOCKS_PER_SEC << 's' << endl;
cout << "wall: " << wall_time() - wall_begin << 's' << endl;
cout << "chrono: " << 1e-6L * chrono::duration_cast<chrono::microseconds>(chrono::steady_clock::now() - chrono_begin).count() << 's' << endl;
cout << "sum = " << sum << endl;
}
Most of the time you get an output you would expect, like either of the two outputs below:
clock: 1.006s
wall: 1.0058s
chrono: 1.00603s
sum = 1335998664
=====
Used: 1014 ms, 12 KB
clock: 1.008s
wall: 1.0082s
chrono: 0.979013s
sum = 1311998688
=====
Used: 998 ms, 4 KB
Very close to 1 second of runtime and about 1.3 billion loop iterations (computers are pretty fast!). But around 1 out of every 10 runs, you end up with something like this instead:
clock: 1.015s
wall: 1.0152s
chrono: 0.986539s
sum = 656999343
=====
Used: 483 ms, 4 KB
Or this:
clock: 1.01s
wall: 1.0108s
chrono: 0.981722s
sum = 653999346
=====
Used: 514 ms, 4 KB
(You can try this out for yourself in custom invocation. Make sure you change the input on the right side each time, because Codeforces caches the output when given the same input.)
This is very weird -- clock()
, wall time, and chrono::steady_clock
all agree here that the runtime has been 1 second, but Codeforces only outputs about half of that. On top of that, the actual sum outputted is also right around half, so the code is actually only doing half as much work. What's happening?
My best hypothesis for this now (warning: lots of speculation here; also shoutout to AnandOza for discussing) is that a small number of the Codeforces judges are older machines that run about 2x slower than the main machines. As a result CF implicitly gives twice as long of a time limit on these machines, so when 1 second of actual CPU time has passed on these machines, you have only used "0.5 seconds" of your time limit, and you've only done half of the amount of work you'd expect to be able to do in 1 second. This idea lines up neatly with the "Attention" notices on old problems that runtime is reported back doubled (e.g., see the top-right of 1C - Ancient Berland Circus).
Is there any workaround for this? Perhaps at least a way to be able to tell which of the two situations we're in? I think there are legitimate use cases for this, such as Berlekamp-Massey as above, randomized algorithms for constructive problems, or randomized algorithms which reduce the probability of error with each iteration. It would be unfortunate if these solutions didn't have a consistent method to tell time and ended up failing because of this random unpredictable factor.
Update: I'm actually not able to reproduce the 500 ms behavior in custom invocation now -- although I did get one really weird output of 748 ms (?).
I also resubmitted the exact same code as one of the WA submissions above, and this time I got AC: 101605801. My best guess is that the older machines are only brought in sparingly to help out during periods of high load (i.e., during contests) and have been turned off now that they're no longer needed. Come to think of it, there was a long queue at some point during the contest this morning, so it seems to add up.
Also, another discussion thread which may be related to this: https://mirror.codeforces.com/blog/entry/74157?#comment-583127
Thanks, I'll look in it today.
clock()
returns wall time on Windows. So what is the issue with lower values ofsum
? It just means that a single cpu core is shared between more than one process. It is a possible situation, that's why cpu time (not wall time) is used to catch a time limit."
clock()
returns wall time on Windows."Wow I didn't know that. cppreference mentions nothing about this. I was only able to confirm this on Microsoft documentation:
"Remarks
The clock function tells how much wall-clock time has passed since the CRT initialization during process start. Note that this function does not strictly conform to ISO C, which specifies net CPU time as the return value. To obtain CPU times, use the Win32 GetProcessTimes function."
After some digging, this looks like the right way to go instead:
After multiple runs I was finally able to capture this result:
So we have a working fix! Unfortunately it seems like this means our speculation was way off? Very peculiar that there were so many 2x coincidences. In any case, thanks MikeMirzayanov for digging into this.
Actually, after testing with some actual submissions it seems there is still a problem. Using the
cpu_time()
function from above, even with a time limit set as low as 3.0 seconds, this submission gets TLE on test 4: 101679987.After some further investigating I realized we need to include both user time and kernel time, so we end up with this submission which finally gets AC: 101680098.
I've updated the code above with the fixed version.