Записи в блоге

Hello there, welcome to my first Codeforces blog post! Today I’d like to share an educational — and frankly a fun observation — on optimizing the process of finding the square root of large integers using binary search. This may not be something you’d use in your everyday code (after all the built-in sqrtll() is hard to beat in speed), but it’s a neat mathematical trick that can literally save you seconds when processing many numbers.

Before we dive in, a huge thanks to chromate00. An answer of mine getting hacked in one of his contests sparked this discovery.

Aim

In a typical binary search algorithm for computing the integer square root of a number S, we initialize the search range with:

low = 0
high = S

This approach finds the square root in O(log(S)) iterations. But if we have to compute square roots for too many numbers (like $$$10^8$$$), it can become laggy. However, if we can choose low and high more intelligently based on the structure of S, we can reduce the search range even further, which can be a significant improvement for very large numbers.

Code

Here is the optimized version of finding the integer square root for some integer val:

Optimized Code

long long custom_sqrt(long long val)
{
    // optimized binary search
    if (val < 0)
        return -1;
    if (val < 2)
        return val;

    // msb of val
    int msb_Val = 63 - __builtin_clzll(val);

    long long low, high, ans;

    if (__builtin_popcountll(val) == 1)
    {
        // when theta is integer i.e.
        // when S is a power of 2
        // high = 1 << ceil(msb_Val/2)
        low = 1LL << (msb_Val >> 1);
        high = 1LL << ((msb_Val + 1) >> 1);
    }
    else
    {
        // when theta is not an integer
        // high = 1 << ceil(msb_Val/2 + fraction)
        // fraction part can never be equal to 1 so we add + 1
        low = 1LL << (msb_Val >> 1);
        high = 1LL << ((msb_Val >> 1) + 1);
    }
    ans = low;
    while (low <= high)
    {
        long long mid = low + high >> 1;
        if (mid <= val / mid) // avoid overflow
        {
            ans = mid, low = mid + 1;
        }
        else
        {
            high = mid - 1;
        }
    }
    return ans;
}

Concepts Needed (the Geeky Part)

Exponential Representation:
Any number x can be written as:
x = $$$e^{\ln(x)}$$$ (Or, equivalently, x = $$$2^{\log_2(x)}$$$)
Floor and Ceiling:
We have:
$$$\text{floor}(x)$$$ $$$\leq$$$ $$$x$$$ $$$\leq$$$ $$$\text{ceil}(x)$$$
Bit Shifting and Powers of 2:
For valid integers, $$$2^x$$$ is equivalent to 1 << x.
Integer and Fractional Parts:
Every real number on the number line x can be decomposed as:
x = [x] + {x} where [x] is the integer part and {x} is the positive fractional part.
Scaling the Fraction: Dividing the fractional part by a constant k (greater than or equal to 1) is equivalent to {x}/k = {x/k}

The Proof

Let’s denote: - $$$\text{S}$$$ as our input number, - $$$\text{A}$$$ as its square root, so $$$\text{A}$$$ = $$$\sqrt{S}$$$

Since any number can be expressed using logarithms, we have:

$$$\text{A}$$$ = $$$\sqrt{S}$$$ = $$$2^{\log_2(\sqrt{S})}$$$

Because we are interested in the integer part of $$$\text{A}$$$ (let’s call it $$$\text{P}$$$), by applying floor/ceiling properties we get:

$$$2^{\text{floor}(\log_2(\sqrt{S}))}$$$ $$$\leq$$$ $$$\text{P}$$$ $$$\leq$$$ $$$2^{\text{ceil}(\log_2(\sqrt{S}))}$$$

Using the equivalence with bit shifting, we can rewrite these bounds as:

$$$\text{(1}$$$ << $$$\text{floor}(\log_2(\sqrt{S}))$$$) $$$\leq$$$ $$$\text{P}$$$ $$$\leq$$$ $$$\text{(1}$$$ << $$$\text{ceil}(\log_2(\sqrt{S})))$$$

Now, here’s the key insight:
When $$$\text{S}$$$ is represented in binary, the floor of $$$\log_2(S)$$$ corresponds to the index of the most significant set bit $$$\text{(MSB)}$$$. Now, $$$\text{MSB}(x)$$$ can be found as follows:

$$$\text{MSB}(S)$$$ = $$$63 -$$$ __builtin_clzll(S)

then we can estimate:

Lower bound:
$$$\text{P}$$$ $$$\geq$$$ $$$\text{1}$$$ << $$$floor(\left(\text{MSB}(S) \, \text{divided by } 2\right))$$$

(Using bit-shift: $$$\text{1}$$$ << $$$(\text{MSB}(S)$$$ >> $$$\text{1})$$$

Upper bound:
$$$\text{P}$$$ $$$\leq$$$ $$$\text{1}$$$ << $$$floor(\left(\text{MSB}(S) \, \text{divided by } 2\right)) + 1$$$

(Using bit-shift: $$$\text{1}$$$ << $$$((\text{MSB}(S)$$$ >> $$$\text{1}) + 1)$$$

These bounds are computed in O(1) time and reduce the range for binary search from $$$[0, S]$$$ to a much smaller interval around the actual square root.

Results

I benchmarked the square root calculation for all integers less than $$$2 \times 10^8$$$:

Standard binary search (with range 0 to S): 29 seconds
Optimized binary search (with our computed bounds): 12 seconds
Built-in function (sqrtll()): 5 seconds

While the built-in function remains the fastest (thanks to hardware acceleration), the optimized binary search is significantly faster than the standard approach. Here is the time duration:

Hell, you run this on your machine and see the difference:

Code Snippet

#include <bits/stdc++.h>
#include <chrono>
using namespace std;
#define ll long long int
#define vll vector<ll>
#define kxxprintln(x) cout << x << endl;

ll custom_sqrt(ll val)
{
    // optimized binary search
    if (val < 0)
        return -1;
    if (val < 2)
        return val;

    // msb of val
    int msb_Val = 63 - __builtin_clzll(val);

    ll low, high, ans;

    if (__builtin_popcountll(val) == 1)
    {
        // when theta is integer
        // when S is a power of 2
        // high = 1 << ceil(msb_Val/2)
        low = 1LL << (msb_Val >> 1);
        high = 1LL << ((msb_Val + 1) >> 1);
    }
    else
    {
        // when theta is not an integer
        // high = 1 << ceil(msb_Val/2 + fraction)
        // fraction part can never be equal to 1 so we add + 1
        low = 1LL << (msb_Val >> 1);
        high = 1LL << ((msb_Val >> 1) + 1);
    }
    ans = low;
    while (low <= high)
    {
        ll mid = low + high >> 1;
        if (mid <= val / mid) // avoid overflow
        {
            ans = mid, low = mid + 1;
        }
        else
        {
            high = mid - 1;
        }
    }
    return ans;
}
ll sqt(ll val)
{
    // standard binary search
    if (val < 2)
        return val;
    ll low = 0, high = val, ans = 0;
    while (low <= high)
    {
        ll mid = low + high >> 1;
        if (mid <= val / mid)
        {
            ans = mid;
            low = mid + 1;
        }
        else
        {
            high = mid - 1;
        }
    }
    return ans;
}
int main()
{
    ios::sync_with_stdio(false);
    cin.tie(nullptr);
    int till = 2e8 + 5e5;

    vll arr(till, 0);

    kxxprintln("time taken to calculate square root of all integers less than " << till << " now starting:");

    auto start1 = chrono::high_resolution_clock::now();

    for (int i = 0; i < till; i++)
    {
        // builtin
        arr[i] = sqrtl(i);
    }

    auto end1 = chrono::high_resolution_clock::now();

    auto duration1 = chrono::duration_cast<chrono::milliseconds>(end1 - start1);

    kxxprintln("time taken: " << duration1.count() << " ms using inbuilt function");

    auto start2 = chrono::high_resolution_clock::now();

    for (int i = 0; i < till; i++)
    {
        // optimized
        ll x = custom_sqrt(i);
        // if precision error, print index of non-precision
        if (x - arr[i])
        {
            kxxprintln(i);
            return 0;
        }
    }

    auto end2 = chrono::high_resolution_clock::now();

    auto duration2 = chrono::duration_cast<chrono::milliseconds>(end2 - start2);

    kxxprintln("time taken: " << duration2.count() << " ms using optimized binary search");

    auto start3 = chrono::high_resolution_clock::now();

    for (int i = 0; i < till; i++)
    {
        // standard
        ll x = sqt(i);
        // if precision error, print index of non-precision
        if (x - arr[i])
        {
            kxxprintln(i);
            return 0;
        }
    }

    auto end3 = chrono::high_resolution_clock::now();

    auto duration3 = chrono::duration_cast<chrono::milliseconds>(end3 - start3);

    kxxprintln("time taken: " << duration3.count() << " ms using binary search");

    return EXIT_SUCCESS;
}

P.S.

I realize that the proof might be a bit dense at first glance — so I’ve attached a detailed and easy-to-understand handwritten version of proof that for sure WILL help clarify the details further. If you find any flaw in this observation, please do let me know. Up to $$$2 \times 10^8$$$, there is no noticeable precision error—even when testing edge cases like Int64_max.

PROOF

Final Thoughts

While this optimized approach isn’t likely to replace the built-in functions in production code, it’s a fun and educational exercise that showcases how a deep understanding of binary representations and bitwise operations can lead to practical performance improvements. If you’re interested in algorithm optimization or just love geeky math, I hope you found this exploration as exciting as I did.

Happy coding and keep optimizing!

Feel free to leave your comments, suggestions, or critiques below. I’d love to hear your thoughts and any further optimizations you might suggest!

Полный текст и комментарии »