»

7 лет назад, скрыть # |

+8

Yes, I've used such pragmas when trying to break some CF problem and it worked. You just need to pay attention to the compiler, since MSVC-specific pragmas don't work on GCC and vice versa (they're simply ignored).

In our IOI selection last year, one guy managed to squeeze some points out of one problem I reused from JOI with looser constraints. It didn't give full score, I think, but it was still quite an impressive speedup and squeezing points like that was in fact one of the expected strategies for those unable to get full score. I imagine most local olympiads don't care — or don't know it matters. Anyway, adding pragmas for AVX and loop unrolling on top of your every code most likely won't slow it down and -O3 is often unnecessary or sometimes even slower than -O2.

→ Ответить

»

rewhile

11 месяцев назад, скрыть # ^ |

+106

This post contains wrong pragma code — please learn to use it correctly!

Sorry for commenting on a 6 years old blog, but this post is the root cause of 10 pages of incorrect pragma usage on google: https://google.com/search?q="pragma+GCC+optimization"

The pragma snippet provided in the blog contains a typo, it's supposed to be optimize NOT optimization. pajenegod has messaged E869120 about this issue, but no changes has been made.

Here's the correct snippet:

#pragma GCC target ("avx2")
#pragma GCC optimize ("O3")
#pragma GCC optimize ("unroll-loops")

Please refer to GCC Optimization Pragmas for more details. I'll quote the relevant part here:

The following does nothing:

#pragma GCC optimize(" unroll-loops")
#pragma gcc optimize("O3")
#pragma GCC optimization("O3")
#pragma optimize(O3)

Yes, these are real-life examples we have seen being used by many competitive programmers, including quite a few LGMs (E.x. 231273727). Perhaps the third one among these came from this post which seems to have popularized the idea of using pragmas, but with a small mistake. Many people are misled to believe that some of the above work, when they actually don't – stop using them.

Also, #pragma GCC optimize("tune=native") and #pragma GCC optimize("arch=native") do nothing, so that's kinda unfortunate.

If ever in doubt about whether your pragmas are correct, turn on most compiler warnings with the command-line option -Wall (or the more specific -Wunknown-pragmas). For example, if you compile the code above with -Wall, you'll get the following output, which tells you that the pragmas are invalid and useless:

foo.cpp:2: warning: ignoring '#pragma gcc optimize' [-Wunknown-pragmas]
    2 | #pragma gcc optimize("O3")
      |
foo.cpp:3: warning: ignoring '#pragma GCC optimization' [-Wunknown-pragmas]
    3 | #pragma GCC optimization("O3")
      |
foo.cpp:4: warning: ignoring '#pragma optimize ' [-Wunknown-pragmas]
    4 | #pragma optimize(O3)
      |
foo.cpp:1:37: warning: bad option '-f unroll-loops' to pragma 'optimize' [-Wpragmas]
    1 | #pragma GCC optimize(" unroll-loops")
      |                                     ^

Try to check if your submissions have similar problems. If yes, then you have probably been using pragmas wrong the entire time — it's time to change your default code.

→ Ответить

»

Xellos

11 месяцев назад, скрыть # ^ |

+28

Not sure why you're replying to me rather than to the blog. Fortunately since compilers warn about unused pragmas (or attributes which are equivalent here), a programmer that reads documentation rather than blindly copies won't have trouble with this.

→ Ответить

»

adamant

11 месяцев назад, скрыть # ^ |

+86

Most likely so that the new comment appears on top, rather than in the bottom, so that it is more noticeable for readers.

→ Ответить

»

E869120

11 месяцев назад, скрыть # ^ |

+18

Fixed.

→ Ответить

»

dalex

7 лет назад, скрыть # |

+36

We need MrDindows and dmkozyrev here

→ Ответить

»

dmkozyrev

7 лет назад, скрыть # |

← Rev. 6 →

+70

This is a short example of x8 speed up for 625 000 000 multiplications of complex numbers: original 3759 ms, 396 kb, improved 187 ms, 400 kb.

I solved a lot of problems (one example), using naive solution in O(n^2) or O(n*q), where q,n <= 200000. Only Ofast, avx, avx2, fma helps a lot (x8 speed up, not so small as x1.2-x1.5), another is not sufficient. AVX for packed floats / doubles, AVX2 for packed integral types, FMA for more effective instructions. When this is enabled, compiler can generate machine-specific code, that allows to work with 256-bit registers by using of avx instructions. But you need to write code in parallel-style with independent iterations of cycles.

You can read a guide to vectorization with intel® c++ compilers. I'm using this too in my everyday work.

UPD. At current time it is a part of GCC/clang compilers, but since C++20 it will be part of standard C++ language. Link, experimental::simd

UPD 2. Increasing of all constraits up to 300-400k will help to drop all such solutions.

→ Ответить

»

shahidul_brur

7 лет назад, скрыть # ^ |

0

Your improved 187 ms, 400kb code contains:

#pragma GCC optimize("Ofast")
#pragma GCC target("avx,avx2,fma")

which is a bit different from the 3 lines shown in the blog post:

#pragma GCC target ("avx2")
#pragma GCC optimization ("O3")
#pragma GCC optimization ("unroll-loops")

Which one is better to use for speed up the code?

→ Ответить

»

dmkozyrev

7 лет назад, скрыть # ^ |

← Rev. 2 →

0

I think that -Ofast includes all safe and unsafe optimizations for speed up. You can check there. I'm using first, but seems that we need to write unroll-loops too.

You can compile your source code with next flags: -fopt-info, -fopt-info-loop, -fopt-info-loop-missed, -fopt-info-vec, -fopt-info-vec-missed. Link to all options. It can detect which lines of code have been failed in process of code optimizations and why.

UPD. I remember that something from list of optimizations allowed me to speed up Segment Tree in 2 times, because it removed tail recursion in recursive queries.

→ Ответить

»

Not-Afraid

6 лет назад, скрыть # ^ |

0

But queries in segment tree uses the result from l to mid and mid + 1 to r and them combine them which is not tail recursion i think(since calling the function is not the last thing done in query function of segment tree). Correct me if i am wrong?

→ Ответить

»

Coxy_Normus

7 лет назад, скрыть # |

+9

What does this 3 lines do?? And if i put this pramgas in my code will it speed up?

→ Ответить

»

muffins

7 лет назад, скрыть # |

+8

Hi does this work for cms?

→ Ответить

»

E869120

7 лет назад, скрыть # ^ |

0

Actually, it does work, at least in JOI 2018/2019 Spring Camp.

→ Ответить

»

muffins

7 лет назад, скрыть # |

+1

Does anyone why does simpler code, make the speedup faster? And also does macros and including bits/stdc++.h instead of iostream for example affect the speedup?

→ Ответить

»

MZuenni

7 лет назад, скрыть # |

+13

here is another example where a naive solution can get accepted with pragmas: https://mirror.codeforces.com/contest/911/submission/33820899

vectorization of code can give really big speedups...

→ Ответить

»

Qualified

6 лет назад, скрыть # ^ |

-9

What is vectorization?

→ Ответить

»

andriy.makukha

6 лет назад, скрыть # ^ |

+10

Seems like targeting AVX2 improves the performance by about 20%: https://mirror.codeforces.com/contest/911/submission/95380976

→ Ответить

»

CopeCope

7 лет назад, скрыть # |

0

Is there a way to solve today's Div2B 1143B - Nirvana with this optimization?

→ Ответить

»

dalex

7 лет назад, скрыть # |

0

To everyone who doesn't know what's going on here: seems that topicstarter doesn't know it either, and it looks like some magic for him.

Better refer to dmkozyrev's message above in the comments.

→ Ответить

»

farmersrice

7 лет назад, скрыть # |

0

Just to let you folks know, last time I checked it didn't work on USACO. L

→ Ответить

»

Heisenbug

7 лет назад, скрыть # |

0

Codeforces uses 32-bit binaries (although the servers themselves are 64-bit IIRC), so AVX won't work. Although I'm not completely sure that every language other than C++ also runs in 32-bit mode. If someone found a language running with a 64-bit interpreter, there would be an opportunity for some "bitness arbitrage"...

→ Ответить

»

Heisenbug

7 лет назад, скрыть # ^ |

0

Never mind, I am wrong. You actually can generate AVX instructions on x86.

→ Ответить

»

SkySurfer

7 лет назад, скрыть # |

+3

will it work only for naive algo ?? because if i am taking simple input and output ..the execution time slows down to 4x .. so what's use of using it ??

→ Ответить

»

LanceTheDragonTrainer

6 лет назад, скрыть # ^ |

+14

Honestly, the main cause of the speedup is called vectorization, which the compiler does automatically due to the pragmas. After blindly trying for some time, I realized that auto-vectorization has very very limited use cases (i.e. it doesn't work most of the time).

In fact, to truly know how your code has been optimized by the compiler, you need to get down to the assembly code. If you are afraid of the assembly code, stay away from these optimization pragmas and optimize on the algo level only.

→ Ответить

»

600iq

6 лет назад, скрыть # |

0

This does not work for oj.uz

→ Ответить

»

radeye

6 лет назад, скрыть # |

← Rev. 2 →

+1

Consider using

#pragma GCC target ("native")

To learn about how different compilers do on different architectures with autovectorization, try

https://godbolt.org

→ Ответить

»

johannesk

6 лет назад, скрыть # ^ |

+3

If you try to compile with this, you get a compiler error along the lines of attribute(target("native")) is unknown.

The correct way to specify it in theory would be #pragma GCC target ("arch=native") or #pragma GCC target ("tune=native").

However, native as architecture isn't recognized in pragmas, see https://stackoverflow.com/a/59846262/1176973. Strangely enough, while tune=native as pragma doesn't trigger an error, it doesn't change the output in any way, whereas -mtune=native as command line argument does.

So all those tune=native's you can see in some submissions or codebooks (e.g. dacin21_codebook) don't do anything.

→ Ответить

»

NikaraBika

6 лет назад, скрыть # |

+4

I think it's a good brief explanation about what exactly unroll-loops does.

https://code-examples.net/en/q/17133ec

→ Ответить

»

aayush_bhat1999

6 лет назад, скрыть # |

0

are there any downfalls for using these optimizations?

→ Ответить

»

DT3264

6 лет назад, скрыть # ^ |

+10

If you don't use them correctly thwy could lead to unexpected behavior from your side

→ Ответить

»

Radhe_Radhe

6 лет назад, скрыть # |

+8

How to identify which One we have to use??? , there are a lot of optimization So can anyone please explain which one should be use or when?? and is there any general optimization option??

→ Ответить

»

Not-Afraid

6 лет назад, скрыть # ^ |

+5

You can read about them Your text to link here...

→ Ответить

»

Codeforcer

6 лет назад, скрыть # |

+8

Most of the online judges seem to ignore pragmas these days.

→ Ответить

»

PedroBigMan

5 лет назад, скрыть # |

-15

Wow!! And I thought JOI Spring Contest was one of the competitions with tightest time limits...

→ Ответить

»

codechaser

5 лет назад, скрыть # |

0

E869120

#pragma 'Optimizations' might slow down your code as well!

NOTE: I don't know much about #pragma, but just wanted to share something I found out while using it.

Compare these two submissions:

Submission 1: #128317329

Submission 2: #128317405

The difference is just that #pragma optimization part is commented out in the accepted submission (#128317405) rest all is same. The one with the 'optimization' got TLE. ;-;

→ Ответить

»

ssvb

5 лет назад, скрыть # ^ |

0

Try to change your compiler to G++17 64-bit. AVX and loops unrolling works much better in the 64-bit mode simply because of a larger number of available registers. Moreover, the 32-bit mode is becoming increasingly more obsolete every year, fewer people are testing the quality of the 32-bit code generated by modern compilers and there may be regressions.

→ Ответить

»

Haidora

11 месяцев назад, скрыть # |

← Rev. 2 →

+1

Actually in Syria, I have recently discovered that pragma is not supported but the bad thing is that I discovered it myself and no one told me bofore the contest or even no announcements were made in spite of submitting many codes that uses them in both days which is a very bad thing because the constraints in most problems are very tight (TL and ML) that you might get a solution with the same complexity as intended but do not pass it due to extra work in your code so you need them to pass even non-bruteforce solutions :(. Anyways, is there any other country that does not allow pragma in their national OIs?

→ Ответить

»

AksLolCoding

6 недель назад, скрыть # |

0

These pragmas got me AC on this USACO problem with $$$O(NQ)$$$

→ Ответить

№	Пользователь	Рейтинг
1	Benq	3792
2	VivaciousAubergine	3647
3	Kevin114514	3603
4	jiangly	3583
5	turmax	3559
6	tourist	3541
7	strapple	3515
8	ksun48	3461
9	dXqwq	3436
10	Otomachi_Una	3413

№	Пользователь	Вклад
1	Qingyu	157
2	adamant	153
3	Um_nik	147
3	Proof_by_QED	147
5	Dominater069	145
6	errorgorn	142
7	cry	139
8	YuukiS	135
9	TheScrasse	134
10	chromate00	133

Блог пользователя E869120

This post contains wrong pragma code — please learn to use it correctly!