The following code produces strange results while compiling it under GCC with different optimization levels:
Somehow optimizations by speed significantly slow the code, while optimizations by size speed it up :-)
Could anybody compile and test the code on your machines? Or, possibly, explain why it is like this?
- gcc source.cpp -> 0.440 s
- gcc -O2 source.cpp -> 2.750 s (-O, -O1, -O2 the same)
- gcc -Os source.cpp -> 0.223 s
- gcc source.cpp -> 3.931 s
- gcc -Os source.cpp -> 2.704 s
- gcc -O2 source.cpp -> 42.142 s
Somehow optimizations by speed significantly slow the code, while optimizations by size speed it up :-)
Could anybody compile and test the code on your machines? Or, possibly, explain why it is like this?
N = 200
N = 500
- gcc source.cpp -> 0.545s
- gcc -O2 source.cpp -> 0.421s
- gcc -Os source.cpp -> 0.466s
For N=500, it is as follows:g++ 0.899s
g++ -O2 4.733s
g++ -Os 0.413s
g++ -O2 -fno-tree-ter 0.390s
One would think that the optimization ftree-ter is broken. However it seems that it's enabled at -Os as well. In fact, the only difference in optimizations between -O2 and -Os is -finline-functions at my system. I tried turning it on, but to no effect.
Here's the relevant part of the man page:
-ftree-ter
Probably, this is the key. Seems to be that this results in copying of strings before comparison. As you can see, the slowdown of plain -O2 seems to be not constant, but asymptotical. I will check this when I reach home.
[Update] I was telling nonsense about asymptotics.
repz cmpsb
, while in other cases it calls the system functionmemcmp
. I found the description of this issue here. Quote:"in the -O0 case, GCC relies on the implementation
of memcmp supplied with the C library. In the -O2 case, GCC instead uses its built-in implementation of memcmp. The built-in function uses the special IA-32 instruction repz cmpsb, which is known to be slow on modern hardware."
Apparently switching off builtins (-fno-builtin) should fix the issue as well.
And Bugzilla link.