Codeforces and Polygon may be unavailable from December 6, 19:00 (UTC) to December 6, 21:00 (UTC) due to technical maintenance. ×

orz's blog

By orz, history, 11 months ago, In English

Usually Euclid's algorithm, which computes $$$\gcd(a, b)$$$, is implemented as follows:

while b != 0:
  a %= b
  swap(a, b)
return a

Or, in recursive fashion,

if b == 0:
  return a
else:
  return gcd(b % a, b)

While it works in $$$\mathcal O(n)$$$ time (where $$$n$$$ is the maximum of binary lengths of $$$a$$$ and $$$b$$$ — that is, big-Theta of the length of the input), it uses quite an expensive operation of integer division. The fastest known procedure of integer division works in $$$\mathcal O(n \log n)$$$ time, so, if we take into account the time spent on arithmetic operations, the time complexity is $$$\mathcal O{\left(n^2 \log n\right)}$$$. But even if we don't, int64 division is still much slower than such operations as addition, subtraction and binary shifts.

If you didn't know there is an algorithm which doesn't need division at all!

def remove_trailing_zeros(a):
  return a >> count_trailing_zeros(a)

def gcd_of_odd_numbers(a, b):
  if a == b:
    return a
  if a < b:
    swap(a, b)
  return gcd_of_odd_numbers(b, remove_trailing_zeros(a - b))

def gcd(a, b)
  if a == 0:
    return b
  if b == 0:
    return a
  return gcd_of_odd_numbers(remove_trailing_zeros(a), remove_trailing_zeros(b)) << min(count_trailing_zeros(a), count_trailing_zeros(b))

The function count_trailing_zeros(a) finds the maximum $$$k$$$ such that $$$a$$$ is divisible by $$$2^k$$$. The function remove_trailing_zeros(a) divides $$$a$$$ by the maximum power of two that divides $$$a$$$. Both these functions can be easily implemented in $$$\mathcal O(n)$$$ time, if we take into account the complexity of arithmetic operations. gcd_of_odd_numbers(a, b) finds gcd of the two numbers $$$a$$$ and $$$b$$$, given they are both odd. Everything except the recursive call works in $$$\mathcal O(n)$$$ time. Note that the sum of binary lengths of numbers is decremented by at least one from call to call, so there will be only $$$\mathcal O(n)$$$ recursive calls. Therefore, gcd_of_odd_numbers(a, b) works in $$$\mathcal O{\left(n^2\right)}$$$ time. Finally, gcd(a, b) is also obvious to take $$$\mathcal O{\left(n^2\right)}$$$ time.

My question is: why does everyone use the implementation with divisions? Are there some hidden advantages? I didn't compare how much these two take with fixed-length integer types and arbitrary-precision integer types in practice. Did someone in community investigated this question? Did you know about division-less gcd implementation at all? Please let me know in the comments.

  • Vote: I like it
  • +86
  • Vote: I do not like it

»
11 months ago, # |
Rev. 2   Vote: I like it +45 Vote: I do not like it

On this blog, the author analyzed the runtime of the std::gcd algorithm versus his optimized version of a binary GCD algorithm. Answering your question on why everyone uses the modular-version of the algorithm: It's way easier to implement, because it's only 2-4 lines. However, the C++ Standard Library for example uses binary GCD too because of how fast it, so it's not entirely true that everyone uses division.

And no, I don't think that there are any hidden advantages to using a standard modulo implementation, because bitwise operations are usually way faster.

»
11 months ago, # |
Rev. 2   Vote: I like it +14 Vote: I do not like it

I would guess most people (like me) don't think about this and instead use the inbuilt C++ __gcd function instead. I don't know if the O(n^2) gcd you mention has a significant performance increase compared to the inbuilt one. It would be interesting to see any benchmarks.

»
11 months ago, # |
  Vote: I like it +20 Vote: I do not like it

For reference, this is called the binary gcd algorithm, and is the algorithm implemented efficiently in the STL for std::gcd (at least in the libstdc++ version that I use, which is the latest version).

I believe you can take their implementation and get Bézout coefficients the same way you would do in the extended Euclidean algorithm.

»
11 months ago, # |
  Vote: I like it +3 Vote: I do not like it

Did you know about division-less gcd implementation at all?

yes

»
11 months ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

The fastest known procedure of integer division works in O(nlogn)

Could someone please describe this a bit or share link to resources on how it is implemented?

  • »
    »
    11 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Division has the same asymptotic time complexity with multiplication. And the $$$O(n \log n)$$$ big integer multiplication is explained on this paper.

  • »
    »
    11 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Something like this: if we need to divide $$$a$$$ by $$$b$$$, we divide something like $$$2^{2+|a|+2|b|}$$$ by $$$b$$$ where $$$|b|$$$ is the number of bits in $$$b$$$. This is just finding an inverse number, it can be done with Newton's methon with same asymptotics as multiplication. Then we find $$$a \cdot \left\lfloor\frac{2^{2+|a|+2|b|}}{b}\right\rfloor / 2^{2+|a|+2|b|}$$$ and round this to the lower or the upper integer.

»
11 months ago, # |
  Vote: I like it +10 Vote: I do not like it

Let me add some of my considerations.

Although the time taken by the division of two integers of lengths $$$n$$$ and $$$m$$$ is $$$\mathcal O((n +m) \log m)$$$ as I said in the post, it is not always asymptotically optimal, this algorithm is too slow when $$$n - m$$$ is small. There is another algorithm which is standard school long division, which works in $$$\mathcal O(\max(1, n - m)m)$$$ time. Although it is usually slower than the fast division, it is a parameterized algorithm — that is, it can be fast if some cleverly chosen parameter is small. In this particular case, this algorithm works fast if the answer (the quotient) is short.

So, if we just use this standard division, all $$$(n - m) m$$$ addends amortize in total $$$\mathcal O{\left(n^2\right)}$$$ time. Moreover, the constant is hardly bigger than the one in the binary gcd algorithm because these two algorithms do the same thing, but the binary one does it from the right end while the ordinary one does it from the left end.