When I was solving the problem 674G — Choosing Ads, I got a TLE on #10 as result. The submission id is 348821830.
I tried to generate a data and run my code on my Ubuntu, and it took 5.80 seconds. (The python code to generate the data is at the end of this blog.)
I was once told that O2 can make the code run faster, so I tried to compile the code with option -O2 (the full command was like g++ ./t.cpp -o ./t -O2), it just took 0.73s. I added #pragma GCC optimize(2) into the code, and then recompiled it without option -O2 (the full command was like g++ ./t.cpp -o ./t), however it took more than 5 seconds. (I used the command /usr/bin/time -v ./t < ./t.in > ./t.out to measure the time.)
As you can see the two ways above has a huge difference.
I'm wondering why this phenomenon happened.
Is there some problem in my code or the way I compile it? or it can just happen on my Ubuntu?
Thanks a lot.
#!/bin/python3
from random import *
n=150000
m=150000
k=20
print(n,m,k)
for _ in range(n):
print(randint(1,6),end=' ')
print()
for _ in range(m):
o=randint(1,2)
l=randint(1,n)
r=randint(1,n)
v=randint(1,6)
if l>r:
l,r=r,l
if o==1:
print(o,l,r,v)
else:
print(o,l,r)
When I was testing on my Ubuntu, all tests generated by this code will cause the phenomenon above.
Update
I've put #pragma GCC optimize(2) to the first line, then the time reduced to 3.13 seconds, still far more than -O2.
Update #2
I've passed it. I changed the map and vector to simple arrays like a[150005], which seems to be a good way to improve speed. On the local test, it took just about 0.5 second. But why there was such a huge difference on pragma and -O2?








Auto comment: topic has been updated by RDFZchenyy (previous revision, new revision, compare).
Auto comment: topic has been updated by RDFZchenyy (previous revision, new revision, compare).
Auto comment: topic has been updated by RDFZchenyy (previous revision, new revision, compare).
Sorry for poor English ...
It seems to do less optimizations; based on the docs, it targets functions separately (which naturally can't optimize as much since there is less for the optimizer to work with at a time). Someone on stack overflow also found that it doesn't perform inlining (which can lead to a huge difference in performance, particularly when the standard library is involved). Note that, while they are specifically talking about O3, this also applies to O2 since O2 normally allows for inlining.
Thanks a lot!
I'd like to share a interesting evidence about inlining, which may support your opinion.
After searching online, I found a blog that told me to use option '-fopt-info-optimized' to show what was optimized by the compiler, after using it and observing the output, I found that the compiler inlined a lot of thing in std::map and std::vector.
The possible reason from my point of view is as follow. When I was debugging my code, my friend 10kdogmore told me that if I use the option '-E', I can view the code "after compiled". However if I compile the file and run, it didn't become faster. So we thought that only cpp code can't run faster. After discussion, we thinks that a thing called "Inline assembly" (maybe, I don't know how to call it.) was done by the compiler.
Sorry for poor English again.
So, the conclusion is that the optimize pragma applies to each function individually, without having the context of the entire program, whereas
-O2can reason about the program as a whole.Some examples to motivate this conclusion:
square(int)function does repeated sum, andmainfunction doescall square(int).#pragma GCC optimize("O2"): https://godbolt.org/z/aKEoEeeoGsquare(int)function recognizes the repeated sum and optimizes it into animulinstruction, butmainfunction still doescall square(int). (Ignore thetestandcmovleinstructions. Those are related to signed int overflow. You can verify this by changing the types frominttounsignedand these instructions are pruned.)-O2: https://godbolt.org/z/5xeod5Gx9square(int)function recognizes the repeated sum and optimizes it into animulinstruction, andmainfunction does not docall square(int), it inlines the content of thesquare(int)function instead.This is a counter example for the assumption that pragma optimize and
-O2are equivalent. There are a lot of other optimizations that can be done as well if the compiler is allowed the context of the whole program. Function inlining is in itself a very big deal because there are a lot of instances where very small functions are passed as objects, for example, comparators likestd::less,std::greater, etc instd::sort,std::set,std::map, etc.Few things to note:
-Eoption does not show the compiled output. It shows the code after it has been preprocessed. This means#includedirectives copy and paste the content of the header files literally, any macros defined using#defineare evaluated and substituted, conditional compilation blocks such as#ifdef,#ifndef,#else,#if, etc are resolved, etc.#pragmais not a preprocessor directive so it is left as is for the later steps of compilation.__asm__ __volatile__("<instruction>", args...);. Function inlining means copying the function body in-place instead of making a call to the function, which is very different from inline assembly.Also, you don't have to apologize for your English. We are all trying here :)
I can't tell you exactly what's happening but I can tell you my story on O2: on one of the first problems that I solved with fft the difference between with/without -O2 was less than 1s to around 10s. Usually we don't notice it because testing on large data locally doesn't happen that often.
Regardless of competitive programming, it is the best sabotage on Mira