When I was doing the problem C. Pokémon Arena in the last Div. 1 round, I submitted my solution and got a TLE on pretest 20.
- GNU C++20 (64), TLE 3000ms 248951083.
I was not suspecting anything but my code, I thought there may be some degenerate issue, undefined behavior or constant issue, but nothing really found. After some unsuccessful attempts, I was able to pass the pretest using fast IO. It's then I found something weird: it only takes a few hundred milliseconds, which is contradictory to my intuition: cin
with sync_with_stdio(false)
is fairly fast and it should not take up so much more (at least 10x) time.
After the contest, I submitted exactly the same code with different language. You know what?
- GNU C++17, AC 280ms 248983571.
It's not only me, and some other participants also encountered such issue. For example:
- GNU C++20 (64), TLE 248946386 by fallleaves01.
Here's snippet for the key code:
int n, m;
cin >> n >> m;
vector a(n, vector (m, 0));
vector c(n, 0);
for (int i = 0; i < n; i++) cin >> c[i];
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
cin >> a[i][j];
}
}
But there are also some successful cin
submissions using GNU C++20 (64). After investigation by (including but not limited to) Sugar_fan, Boboge, -skyline-, the key components of TLE are:
- The language must be GNU C++20 (64).
vector
must be ofint
. If the elements arelong long
, it passed. 249050206- The definition of 2D
vector
must be before readingc
. If swap these two lines, it passed. 249050425
So here's the thing. It could not even be simply interpreted as some branch mispredictions or cache misses, it seems something is completely broken, and we still don't understand what is actually wrong.
Auto comment: topic has been updated by -14 (previous revision, new revision, compare).
GNU C++17(20) also fails. 249014238 Maybe it's related with 64bit platform. int is 32bit. There may be some bugs in the complier dealing with this case, I think.
It seems that the issue is related to the Windows system or codeforces judger. I can't reveal the problem on my Linux(Ubuntu) computer.
Probably, it is caught by cache misses. You can swap indices by which you access data in the
a
array (vector a(m, vector (n, 0));
->vector a(n, vector (m, 0));
,a[x][i]
->a[i][x]
and so on) and check if it gets passed.It's not the issue. My reason is: replacing
cin
intoscanf
leads to AC. 249050013249053153
Yes, it works. But why I said it's not the issue:
cin
intoscanf
passed. So swapping indices is not necessary.int
tolong long
(which is bad for cache) as mentioned in the main text. I am not surprised if there are many more changes to pass, but that's not the real issue which should be supported by ablation experiments.249055299 should give some ideas ;)
That's interesting. I had never thought of the vector capacity is not a power of two when constructing it with size. This may contribute to the slowdown in an indirect (but still mysterious) way. Incorporating the
long long
observation, the bug trigger must include a weird specific vector size (in this case, $$$40000 \times 10$$$).But to my understanding of cache, using power of two as the low dimension size leads to more cache conflicts. So currently my main suspection is: there may be some allocator issues in STL, so that
cin
and 2Dvector
slowed down each other.Maybe there is difference between
std::ios_base::sync_with_stdio(false);
andstd::ios::sync_with_stdio(false);
? I used first one and had no issue on 64-bit compiler: https://mirror.codeforces.com/contest/1936/submission/248927650 You used second one and caught TLEIt seems not the issue. My code still got TLE. 249055450
249054815 without
sync_with_stdio(false)
at all.All of the
sync_with_stdio
functions are calling the same exact function.sync_with_stdio
is a static member function inherited from theios_base
class. I presonally prefer to call it usingcin.sync_with_stdio
, but you could call it in all kind of ways. It doesn't matter how you do it.C++ is a weird programming language where bugs can be elusive, only appearing seemingly at random. Small seemingly unimportant changes to the code can make them appear or dissapear.
Just from reading this, it sounds like you likely have UB (undefined behaviour), possibly due to overflow.
Claiming that fast IO is the culprit is far too hasty of a conclusion. I would need to be 100% sure that no UB is present before drawing any conclusion like that.
I am 99.9% sure it's not my code that UB.
See 249078517. On test case 20, it should be equivalent to something like this, which I don't think an UB can exist.
I did some digging. I took multiple different C++ submissions from different users and modified their IO. Then by submitting to codeforces GYM I'm able to test the problem with a higher TL.
First thing, I've only been able to trigger the slowness bug using cin, and not using scanf. I've also noticed that it doesn't not seem matter if you use cin.tie and/or sync_with_stdio. In all cases I've tested, the slowness bug triggered independently of this.
One way to trigger the slowness bug is by reading input like this (It takes 4.1s on both TC 20 and 21):
The following also takes 4.1s (on both TC 20 and 21):
While reading input like this gives AC (typically < 0.2s):
It is completely unreasonable that reading $$$4 \cdot 10^5$$$ integers would take 4 s. I have no clue what is actually going wrong here.
Here is an update on this. I tested doing
where
and it still TLEs on TC20 249166970. This is absurd. It gets TLE simply by reading $$$4 \cdot 10^5$$$ values into
a[0][0]
.Where do you read
c[i]
thoughIn my code I simply read 4e5 ints. That's all my code is doing on TC20.
Ah, you don't do anything after this, got it
To add on, I was easily able to reproduce this myself by swapping two lines in my submission: 249017883.
Hmm, maybe there is a bug in the compiler or libstdc++: 249085796.
hey
I've set up test20 as a separate problem, you can add it by using
50437e8d91037c0fdbe5c65a
in the mashup, or you can download hereI hope this helps in any way.
After deleting all calculations, it uses 4000ms for inputs
Observed some strange content, here's test code
It seems like it's just uniformly slow for no reason at all, There isn't any suspicious lag.
Just FYI, it seems both TC20 and TC21 triggers the bug. It is not only TC20.
Reading input in 4000ms :clown
I've figured out how to trigger this slowness bug on basically any problem on CF!
vector<vector<int>> TLE(40000, vector<int>(7));
somewhere in global space.For example take Tourist's solution 248948695 to problem 1936-D - Bitwise Paradox. With the vector of death added to the code 249181295, it gets TLE on TC5 (taking 5s). While without the vector of death, Tourist's submission takes 155 ms on TC5.
When you compile the code locally asm and submit it directly (a cheat trick known to many), it will still result in a TLE, but locally there is no problem.
My guess is that there is a problem with memory allocation in codeforces 64 bit.
Here is a stand alone example with the slowdown (credit to kostia244 for showing me this)
The vector of death makes this run about 100 times slower.
Maybe, it's time to invite MikeMirzayanov to have a look at the problem. It's almost not reproducible on any other environment, so only administrators can further investigate the issue.
original comment: replacing " " with to_string(' ') take 290ms as pointed out below this is because ' ' is 32 lol
Reproduced it for 64 bit integers using size 4 for inner vector
AFAIU, it was intended to generate a string containing
n
integers to pass it to istringstream. You broke this becauseto_string(' ')
is justto_string(32)
.I'm having strong flashbacks by comparing this blog and https://mirror.codeforces.com/blog/entry/108168 ...
It feels kinda frightening when you realize that your correct solution can get TLE due to some absolutely unexpected stuff D:
what about
basic_string
orvalarray
?How do you install GNU compiler on Windows? I have the MinGW compiler from CodeBlocks (added to PATH) and I can't even compile with -c++17 using it.
I installed mingw64 compiler from this (don't follow any link, download from this page).
Now my g++ version is 13.2.0 and it supports c++20. I took this test and compiled with
g++ main.cpp -std=c++20 -o main.exe
. It executes in about 600 ms on both c++20 and c++17. I don't see strange behaviour.Did someone experienced this locally?
I installed specifically version 11.2.0, used compilation flags from here:
g++ -Wall -Wextra -Wconversion -static -DONLINE_JUDGE -Wl,--stack=268435456 -O2 -std=c++20 %1
and used this program from one of the comments above.
Still, no strange behaviour. It executes in about 850 ms.
Wait, I just noticed strange thing on Codeforces.
I just ran the exact this code in custom test. It shows time 4851 ms always and actually, the time between the moment I press button
Run
and log appears is about 1.5 seconds.Seems the problem is in judge system, not the compiled program.
I think that is simply due to codeforces caching the result of the invocation. You need to change the code/input slightly to make codeforces actually run your code.
Indeed.
However, the question remains: could someone experience this locally, or it is the issue of only mysterious codeforces compiler.
A while back i faced a similar issue: https://mirror.codeforces.com/blog/entry/99038
Maybe they are related
I have new conjecture. Look at this submission: https://mirror.codeforces.com/contest/1936/submission/249331767 It is same as the one that gets TLE, but with fast allocator. So I suppose that each invocation
std::cin
allocates and free memory which is slow due to "vector of death" allocating many small chunks of memory. Andscanf
seems to not allocate or free memory.