Usually Euclid's algorithm, which computes $$$\gcd(a, b)$$$, is implemented as follows:

```
while b != 0:
a %= b
swap(a, b)
return a
```

Or, in recursive fashion,

```
if b == 0:
return a
else:
return gcd(b % a, b)
```

While it works in $$$\mathcal O(n)$$$ time (where $$$n$$$ is the maximum of binary lengths of $$$a$$$ and $$$b$$$ — that is, big-Theta of the length of the input), it uses quite an expensive operation of integer division. The fastest known procedure of integer division works in $$$\mathcal O(n \log n)$$$ time, so, if we take into account the time spent on arithmetic operations, the time complexity is $$$\mathcal O{\left(n^2 \log n\right)}$$$. But even if we don't, `int64`

division is still much slower than such operations as addition, subtraction and binary shifts.

If you didn't know there is an algorithm which doesn't need division at all!

```
def remove_trailing_zeros(a):
return a >> count_trailing_zeros(a)
def gcd_of_odd_numbers(a, b):
if a == b:
return a
if a < b:
swap(a, b)
return gcd_of_odd_numbers(b, remove_trailing_zeros(a - b))
def gcd(a, b)
if a == 0:
return b
if b == 0:
return a
return gcd_of_odd_numbers(remove_trailing_zeros(a), remove_trailing_zeros(b)) << min(count_trailing_zeros(a), count_trailing_zeros(b))
```

The function `count_trailing_zeros(a)`

finds the maximum $$$k$$$ such that $$$a$$$ is divisible by $$$2^k$$$. The function `remove_trailing_zeros(a)`

divides $$$a$$$ by the maximum power of two that divides $$$a$$$. Both these functions can be easily implemented in $$$\mathcal O(n)$$$ time, if we take into account the complexity of arithmetic operations. `gcd_of_odd_numbers(a, b)`

finds `gcd`

of the two numbers $$$a$$$ and $$$b$$$, given they are both odd. Everything except the recursive call works in $$$\mathcal O(n)$$$ time. Note that the sum of binary lengths of numbers is decremented by at least one from call to call, so there will be only $$$\mathcal O(n)$$$ recursive calls. Therefore, `gcd_of_odd_numbers(a, b)`

works in $$$\mathcal O{\left(n^2\right)}$$$ time. Finally, `gcd(a, b)`

is also obvious to take $$$\mathcal O{\left(n^2\right)}$$$ time.

My question is: why does everyone use the implementation with divisions? Are there some hidden advantages? I didn't compare how much these two take with fixed-length integer types and arbitrary-precision integer types in practice. Did someone in community investigated this question? Did you know about division-less gcd implementation at all? Please let me know in the comments.