RDFZchenyy's blog

By RDFZchenyy, history, 6 months ago, In English

When I was solving the problem 674G — Choosing Ads, I got a TLE on #10 as result. The submission id is 348821830.

I tried to generate a data and run my code on my Ubuntu, and it took 5.80 seconds. (The python code to generate the data is at the end of this blog.)

I was once told that O2 can make the code run faster, so I tried to compile the code with option -O2 (the full command was like g++ ./t.cpp -o ./t -O2), it just took 0.73s. I added #pragma GCC optimize(2) into the code, and then recompiled it without option -O2 (the full command was like g++ ./t.cpp -o ./t), however it took more than 5 seconds. (I used the command /usr/bin/time -v ./t < ./t.in > ./t.out to measure the time.)

As you can see the two ways above has a huge difference.

I'm wondering why this phenomenon happened.

Is there some problem in my code or the way I compile it? or it can just happen on my Ubuntu?

Thanks a lot.


#!/bin/python3

from random import *

n=150000
m=150000
k=20

print(n,m,k)
for _ in range(n):
    print(randint(1,6),end=' ')
print()
for _ in range(m):
    o=randint(1,2)
    l=randint(1,n)
    r=randint(1,n)
    v=randint(1,6)
    if l>r:
        l,r=r,l
    if o==1:
        print(o,l,r,v)
    else:
        print(o,l,r)

When I was testing on my Ubuntu, all tests generated by this code will cause the phenomenon above.


Update

I've put #pragma GCC optimize(2) to the first line, then the time reduced to 3.13 seconds, still far more than -O2.


Update #2

I've passed it. I changed the map and vector to simple arrays like a[150005], which seems to be a good way to improve speed. On the local test, it took just about 0.5 second. But why there was such a huge difference on pragma and -O2?

  • Vote: I like it
  • +22
  • Vote: I do not like it

»
6 months ago, hide # |
 
Vote: I like it +3 Vote: I do not like it

Auto comment: topic has been updated by RDFZchenyy (previous revision, new revision, compare).

»
6 months ago, hide # |
 
Vote: I like it +3 Vote: I do not like it

Auto comment: topic has been updated by RDFZchenyy (previous revision, new revision, compare).

»
6 months ago, hide # |
 
Vote: I like it +3 Vote: I do not like it

Auto comment: topic has been updated by RDFZchenyy (previous revision, new revision, compare).

»
6 months ago, hide # |
 
Vote: I like it +8 Vote: I do not like it

Sorry for poor English ...

»
6 months ago, hide # |
Rev. 2  
Vote: I like it +15 Vote: I do not like it

It seems to do less optimizations; based on the docs, it targets functions separately (which naturally can't optimize as much since there is less for the optimizer to work with at a time). Someone on stack overflow also found that it doesn't perform inlining (which can lead to a huge difference in performance, particularly when the standard library is involved). Note that, while they are specifically talking about O3, this also applies to O2 since O2 normally allows for inlining.

  • »
    »
    6 months ago, hide # ^ |
     
    Vote: I like it +3 Vote: I do not like it

    Thanks a lot!

    I'd like to share a interesting evidence about inlining, which may support your opinion.

    After searching online, I found a blog that told me to use option '-fopt-info-optimized' to show what was optimized by the compiler, after using it and observing the output, I found that the compiler inlined a lot of thing in std::map and std::vector.

    The possible reason from my point of view is as follow. When I was debugging my code, my friend 10kdogmore told me that if I use the option '-E', I can view the code "after compiled". However if I compile the file and run, it didn't become faster. So we thought that only cpp code can't run faster. After discussion, we thinks that a thing called "Inline assembly" (maybe, I don't know how to call it.) was done by the compiler.

    Sorry for poor English again.

    • »
      »
      »
      6 months ago, hide # ^ |
       
      Vote: I like it +12 Vote: I do not like it

      So, the conclusion is that the optimize pragma applies to each function individually, without having the context of the entire program, whereas -O2 can reason about the program as a whole.

      Each function that is defined after this point is treated as if it had been declared with one optimize(string) attribute for each string argument.

      Some examples to motivate this conclusion:

      1. No optimizations: https://godbolt.org/z/noY38bGzd square(int) function does repeated sum, and main function does call square(int).
      2. #pragma GCC optimize("O2"): https://godbolt.org/z/aKEoEeeoG square(int) function recognizes the repeated sum and optimizes it into an imul instruction, but main function still does call square(int). (Ignore the test and cmovle instructions. Those are related to signed int overflow. You can verify this by changing the types from int to unsigned and these instructions are pruned.)
      3. -O2: https://godbolt.org/z/5xeod5Gx9 square(int) function recognizes the repeated sum and optimizes it into an imul instruction, and main function does not do call square(int), it inlines the content of the square(int) function instead.

      This is a counter example for the assumption that pragma optimize and -O2 are equivalent. There are a lot of other optimizations that can be done as well if the compiler is allowed the context of the whole program. Function inlining is in itself a very big deal because there are a lot of instances where very small functions are passed as objects, for example, comparators like std::less, std::greater, etc in std::sort, std::set, std::map, etc.

      Few things to note:

      1. -E option does not show the compiled output. It shows the code after it has been preprocessed. This means #include directives copy and paste the content of the header files literally, any macros defined using #define are evaluated and substituted, conditional compilation blocks such as #ifdef, #ifndef, #else, #if, etc are resolved, etc. #pragma is not a preprocessor directive so it is left as is for the later steps of compilation.
      2. Inline assembly is a term used for writing literal assembly instructions in the program. It looks like __asm__ __volatile__("<instruction>", args...);. Function inlining means copying the function body in-place instead of making a call to the function, which is very different from inline assembly.

      Also, you don't have to apologize for your English. We are all trying here :)

»
6 months ago, hide # |
 
Vote: I like it +7 Vote: I do not like it

I can't tell you exactly what's happening but I can tell you my story on O2: on one of the first problems that I solved with fft the difference between with/without -O2 was less than 1s to around 10s. Usually we don't notice it because testing on large data locally doesn't happen that often.

»
6 months ago, hide # |
 
Vote: I like it +29 Vote: I do not like it

Regardless of competitive programming, it is the best sabotage on Mira