I started thinking about this during my livestream of the solution to round 731, and discussed this in a comment a bit earlier, but it seems novel enough and might require enough debate that I thought it makes sense to be it's own blog post:
Suppose I have an array $$$a$$$ of $$$n$$$ integers, each up to a million. What is the worst-case runtime of the following code?
int gcd(int a, int b) {
return b==0?a:gcd(b, a%b);
}
//in main
int gcd=a[0];
for (int i:a) gcd=gcd(i, gcd);
Conventionally, I have always thought it was $$$n*log(MAXVAL)$$$. However, each time the gcd call runs, the answer decreases by an expected factor of about 1.6 (the golden ratio). But really, we only do all $$$log(n)$$$ calls if the gcd gets quite small. In that case, future gcd calls will be $$$O(1)$$$, not $$$O(log)$$$. So do we have a clean potential function like: the number of prime factors in gcd, which can bound the total GCD recursion? I know that this is some very hand-wavey logic, but it seems almost legit enough for me to believe it. Or is this totally not true? I'm secretly wishing it's true because it would be cool, but also I feel like I would have seen this before if it was both commonly known and true.
TLDR: is the above code really $$$O(n + log(MAXVAL))$$$, instead of $$$O(n * log(MAXVAL))$$$?
I think it is $$$\mathcal O(n + \log(MAXVAL))$$$. If the $$$\gcd$$$ doesn't change, calling $$$\gcd$$$ takes $$$\mathcal O(1)$$$ time. Otherwise, say on the $$$k$$$ times it changes, let's say we recurse $$$r_i$$$ times into $$$\gcd$$$. Then $$$\mathcal O(n + \sum r_i)$$$ is the number of times we recurse into $$$\gcd$$$, which also should be the time complexity. Let $$$g_0 = a[0]$$$, and let $$$g_i$$$ be the $$$\gcd$$$ after the ith time we change the $$$\gcd$$$. So note that $$$\frac{g_i}{g_{i-1}} \geq 2^{r_i}$$$, because each call into $$$\gcd$$$ reduces the maximum of $$$a, b$$$ by a factor of at least $$$2$$$. Then taking the product of this all for all $$$i$$$, we get that $$$\frac{g_0}{g_k} \geq 2^{\sum r_i}$$$, and taking the $$$\log$$$ of this gives our desired bound.
For me, this is both commonly known and true. I think I first saw this for segtree of gcd.
In short, yes. The runtime of a single call $$$gcd(a, b) = x$$$ can be bounded above by a constant times $$$1+\log{\frac{\min \{a, b\}}{x}}$$$ arithmetic operations by (for example) noticing that the size of smaller argument must be reduced by at least half every two iterations. From there, it's easy to bound the total runtime by a telescoping series to get $$$O(n + log(MAXVAL))$$$ runtime for the loop.
Deleted
Yes, but this is only for one gcd call. Since the gcd of two consecutive Fibonacci numbers is 1, all following calls will run in constant time.
It is $$$O(n + log(maxval))$$$. I remember it being discussed in cf blogs a few times. If you build gcd-segment tree, the query complexity is also $$$O(log(n) + log(maxval))$$$ for the same reason.
Is this theory of gcd applicable to build for something like sparse tables / Segtrees too ? If not, what is the actual complexity on these structures ?
Yet I think sparse table is $$$O(n \times (\log n + \log a))$$$ whereas $$$n$$$ is the size and $$$a$$$ is maximum value
Since the gcd will be calculated independently to the size of $$$n$$$. And there are $$$n$$$ such value. Then the gcd part should not be calculated more then $$$O(n \times \log a)$$$ as there are not thing more to reduce the gcd
And since the sparse table itself having $$$O(n \times \log n)$$$ complexity. This give us the total of $$$O(n \times (\log n + \log a))$$$ but I think it can be some what smaller as for random cases