kevlu8's blog

By kevlu8, 3 months ago, In English

GCC has many optimization pragmas that can be prepended to files. Generally, they should speed up your code the same amount as the equivalent command-line argument, however this is not always the case.

Theoretically, you would expect

#pragma GCC optimize("O3")

to optimize your code the same way as

g++ main.cpp -O3

But it doesn't! Let's take a look at an example program:

#pragma GCC optimize("O3")
#include <bits/stdc++.h>
using namespace std;

#define SZ 10000005

int arr[SZ] = {};

int main() {
    iota(arr, arr+SZ, 1);
    int tgt = 19999473;
    unordered_set<int> s(arr, arr+SZ);
    for (int i = 0; i < SZ; i++) {
        if (s.count(tgt-arr[i])) {
            cout << arr[i] << ' ' << tgt-arr[i] << '\n';
            break;
        }
    }
}

This is a pretty simple and well-known solution to a problem. It solves the Two-Sum problem using a hashset.

Here's a chart showing the runtime of the program with and without the pragma (running on Ryzen 7 7700X, compiled with no other arguments, mean of 5 trials):

Optimization Time (s)
None, without pragma ~1.79
None, with pragma ~0.98
-O3, without pragma ~0.36
-O3, with pragma ~0.36

As you can see, the pragma does much worse than the -O3 flag, even though they should be equivalent. Why is this?

Looking into the assembly code generated, we can see that the code generated by the -O3 command-line argument actually does not contain any occurrences of unordered_set, whilst the code generated by the pragma contains loads of occurrences. What does this mean?

This actually tells us that -O3 performs more optimizing (specifically, inlining) than the pragma. This is further demonstrated by the the following example:

#pragma GCC optimize("O3")
static int return5() {
    return 5;
}
int main() {
    return return5();
}

With the pragma, the generated assembly code (simplified) is:

main:
    jmp return5
return5:
    mov eax, 5
    ret

With -O3, the generated assembly code is:

main:
    mov eax, 5
    ret

Anyone can see that return5 should be inlined. It's even a static function! But the pragma doesn't inline it, whilst -O3 does. Even after adding inline-functions,inline-small-functions,inline-functions-called-once to the pragma, it still doesn't get optimized. After adding __attribute__((always_inline)) to the function, it finally gets inlined. Why this is the case is beyond me. Although this is a very minor example, as shown by the first example, these kinds of small improvements matter more and more as the program gets more complex.

There are probably many more examples of optimizations that -O3 does that the pragma doesn't, but the most important thing to take away from this is that the pragma is not quite equivalent to -O3.

Most of the time, there is no reason to use #pragma GCC optimize("O3") over -O3, because you can just modify your compile-time command-line arguments. The only place where this is necessary would be competitive programming, since most judges compile with -O2 and sometimes you're able to squeeze into the time limit by using O3 and avx2.

What can we do with this information? Not much, really. Just be aware that the pragma is not equivalent to -O3, and that you should use -O3 over the pragma whenever possible. However, in situations where specifying -O3 in the command line is not possible, the pragma is a passable alternative.

One final note: make sure that if you use the pragma, you use it at the top of the file, before any includes. If you use it in the middle of the file, it will only apply to the code after the pragma.

Thanks for reading my first blog post! I hope you enjoyed!

Full text and comments »

  • Vote: I like it
  • +125
  • Vote: I do not like it