Preparing a lookup table in binary GCD can be somewhat effective if the inputs are small. I've seen a 30% time reduction with a 512x512 table (which consumes almost no memory and time) for inputs less than $$$2\times10^9$$$. It scales quite linearly with the log.








