Different behavior between gcd and __gcd

#	User	Rating
1	Benq	3792
2	VivaciousAubergine	3647
3	Kevin114514	3611
4	jiangly	3583
5	strapple	3515
6	tourist	3470
7	dXqwq	3436
8	Radewoosh	3415
9	Otomachi_Una	3413
10	Um_nik	3376

#	User	Contrib.
1	Qingyu	161
2	adamant	150
3	Um_nik	146
4	Dominater069	144
5	errorgorn	141
6	cry	139
7	Proof_by_QED	136
8	YuukiS	135
9	chromate00	134
9	TheScrasse	134

I experienced something weird while solving 2050F - Maximum modulo equality. Basically, I tried solving it by decomposing the array into $$$\mathcal{O}(\sqrt{n})$$$ blocks. I submitted several of them each using difference size of blocks, and they passed in 2s during the contest. Surprisingly, in system tests, all of my submissions got TLE on my hack test that was used to hack other similar solutions that worked much slower in the original tests.

It felt really weird, because the most basic gcd implementation we learn is something like this:

int G(int a, int b)
{
    return b ? G(b, a % b) : a;
}

Although a single call for this function takes $$$\mathcal{O(\log{x}})$$$ time where $$$x$$$ is the value of the input, when we repeatedly apply this fuction $$$k$$$ times where either $$$a$$$ or $$$b$$$ is the result of a previous call on this function, it should only take $$$\mathcal{O}(k+\log{x})$$$ because the depth of the recursion gets deeper only when the result gets smaller. Thus, my solution should only take $$$\mathcal{O}((q+\sqrt{n})(\sqrt{n}+\log{x}))$$$ time, which should be fast enough to run in 5s.

However, the result shows that it's not the case. Check out these three versions (note that I used using ll = int; so the divisions aren't that slow):

The first one uses std::gcd, the second one uses __gcd which is a gcc extension, and the third one uses the G function I wrote above. The first one got TLE, while the other two took only around 2 seconds. So, why did this happen?

I asked about this to some people, and the conclusion was that std::gcd is maybe using binary gcd algorithm, which still takes time proportional to the number of significant bits of the input, even if the result remains large. For example, when res = 0 initially, repeatedly calling res = gcd(res, (int)1e9) will keep taking time proportional to $$$\log{10^9}$$$ each time, while the traditional Euclidian algorithm will only call itself recursively once each time. Conclusively, my solution with std::gcd was likely to be $$$\mathcal{O}((q+\sqrt{n})(\sqrt{n}\log{x}))$$$, which is too slow indeed.

qwerasdfzxcl suggested me using b = gcd(a % b, b) instead of b = gcd(a, b), and it does work: 295441081, and it's a lot faster than the other versions! I was told that it will bound at $$$\mathcal{O}(k+\log^2{x})$$$, ~~but I don't quite understand this well so if anyone has idea on this, please do comment.~~ and the theory is that assuming gcd returns $$$b$$$ immediately when $$$a\%b=0$$$, any other case will be that $$$b$$$ is at least halved after this gcd call and this happens only $$$\mathcal{O}(\log{x})$$$ times.

Lessons learned: Do not naively use std::gcd when we need to repeatedly perform gcd on its result. Alternatively, using C++17 works, presumbaly because it used Euclidan algorithm back then. C++23 doesn't work.

Comments (20)

Show archived | Write comment?

Titan493ASST

17 months ago, hide # |

→ Reply

Cuellius

Pragmas also help: 295462909.

lrvideckis

useful knowledge!

std::gcd is maybe using the binary gcd algorithm

Looks like it is: std:gcd: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/std/numeric#L137-L171

__gcd: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/stl_algo.h#L1118-L1130

Sving1024

8 months ago, hide # ^ |

+10

And this commit shows that GCC switched its implementation of std::gcd from the Euclidean algorithm to the binary GCD algorithm several years ago.

I also wanna say that the implementation of std::gcd is related to the version of GCC instead of the version of the C++ standard. Codeforces uses GCC 7 to compile C++17 code, and at that version, it was still using the Euclidean algorithm.

peltorator

← Rev. 4 →

+36

This is insane... It is the first time I see the new C++ version being slower than the old one :) And from the case of gcd(0, 1e9) it looks like it's not just a random competitive programming quirk but a real worsening of the time complexity. I actually find it rather odd. One can implement binary gcd to work in $$$O(\log \min(a, b))$$$ time but from your example apparently it doesn't do that. What we use to bound the time complexity of the repeated application of gcd is that the time complexity of the Euclid's algorithm is $$$O(\log (\min(a, b) / \gcd(a, b)))$$$. This I agree is an obscure time complexity that only competitive programmers could care about. But the difference between $$$O(\log \min(a, b))$$$ and $$$O(\log \max(a, b))$$$ is actually significant...

I would also like to point out that __gcd(x, 0) is undefined behaviour, so this problem is indeed annoying. I guess the only good solution is to always just use your own gcd function. But that's just sad...

djm03178

17 months ago, hide # ^ |

Oh, I didn't know that __gcd(x, 0) is UB. I guess I'll just use my own implementation when the TL could be tight :)

yes, it uses division by zero in this case: https://mirror.codeforces.com/blog/entry/13410?#comment-182865

Swistakk

Why does it always work as it should then?

assadiam

guys how to be good at c++ the language ?

Kirigakure_Garbaca

12 months ago, hide # ^ |

Try learncpp.com. It has everything basic thing you need to know about c++, and the content is always up-to-date.

are you sure __gcd(x,0) is UB? I just tested it in codeforces custom test and it works, also see this https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/stl_algo.h#L1118-L1130

ok, interesting. so does it mean they changed something in c++20?

← Rev. 2 →

qwerasdfzxcl's trick is a co-called hybrid approach: reference. Based on the benchmarks form the cited blog, it is the fastest approach.

Sachinkumar99

Why did u use sqrt decomposition ?

Isn't something like segment tree or sparse table much faster ?

If you ask me why... because of my skill issue? :)

The segment tree solution just didn't come to me for some reason at that time, and later in the contest I found that segment tree would be faster so I re-wrote the solution with segment tree, so I was actually able to keep AK despite that 6 of my submissions FST'ed.