How to speed up the sieve of Eratosthenes by 1.5 times with one line

№	Пользователь	Рейтинг
1	tourist	3985
2	jiangly	3814
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3517
7	Radewoosh	3410
8	hos.lyric	3399
9	ecnerwala	3392
9	Um_nik	3392

№	Пользователь	Вклад
1	cry	169
2	maomao90	162
2	Um_nik	162
4	atcoder_official	161
5	djm03178	158
6	-is-this-fft-	157
7	adamant	155
8	awoo	154
8	Dominater069	154
10	luogu_official	150

There is such an implementation of the sieve of Eratosthenes:

usual

const int N = 1e8;

bool used[N];

void solve() {
    ll sum = 0;
    for (int i = 2; i < N; i++) {
        if (!used[i]) {
            sum += i;
            for (int j = min((ll)INT32_MAX, i * 1ll * i); j < N; j += i)
                used[j] = true;
        }
    }
    cout << sum << "\n";
}

Then replace the second line with bitset used;

bitset

const int N = 1e8;

bitset<N> used;

void solve() {
    ll sum = 0;
    for (int i = 2; i < N; i++) {
        if (!used[i]) {
            sum += i;
            for (int j = min((ll)INT32_MAX, i * 1ll * i); j < N; j += i)
                used[j] = true;
        }
    }
    cout << sum << "\n";
}

For comparison, let 's take another linear sieve of Eratosthenes.

linear

const int N = 1e8;

int le[N], primes[N / 10];

void solve() {
    ll sum = 0;
    int m = 0;
    for (int i = 2; i < N; i++) {
        if (!le[i]) {
            sum += i;
            primes[m++] = i;
            le[i] = i;
        }
        for (int j = 0; j < m && primes[j] <= le[i] && primes[j] * i < N; j++)
            le[primes[j] * i] = primes[j];
    }
    cout << sum << "\n";
}

Running time table on compiler GNU G++20 11.2.0 (64 bit, winlibs) with #pragma GCC optimize("O3") in all three solutions.

\begin{array}{|c|c|c|c|} \hline N & usual & linear & bitset \cr \hline 2e7 & 202ms & 140ms & 78ms \cr \hline 5e7 & 467ms & 374ms & 249ms \cr \hline 1e8 & 1138ms & 732ms & 686ms \cr \hline 2e8 & 2308ms & ML & 1669ms \cr \hline \end{array}

I guess it's because of the cache. The data takes up 8 times less memory and is cached better. The more cache there is on the computer, the more noticeable the acceleration. For example, I have a 3-fold acceleration on $$$N = 1e8$$$ compared to a conventional sieve.

Unfortunately, this is useless in practice, because almost always in the sieve we want to get some more information.

Комментарии (7)

Написать комментарий?

debugging_since_epoch

17 месяцев назад, # |

great blog ! I just have a few questions for the following line:

for (int j = min((ll)INT32_MAX, i * 1ll * i); j < N; j += i)

to be more spesific this part :

int j = min((ll)INT32_MAX, i * 1ll * i)

for some reason it seems like what you did there makes it the code 300ms faster for N = 1e8 , and I really couldnt understood it , can you elaborate that part

→ Ответить

elamharnish

17 месяцев назад, # ^ |

When you consider all multiples of i larger than i, you do not need to consider j * i for j less than i, because you already considered them before (for j < i). But one must avoid overflow of integers, that is why one need to cast i to long long. If I'm not mistaken this gives O(n log(log(n))) complexity for the sieve. Sorry, if I misunderstood your question.

wow thanks it seems like a great trick !

nor

← Rev. 2 →

+52

If you just want primes, this is pretty fast too:

Spoiler


template <int N = 1'000'000, bool compute_primes = true>
struct fast_sieve {
    std::bitset<N / 3 + 1> sieve;
    vector<int> primes;
    inline bool is_prime(int n) {
        return n == 2 || n == 3 || ((n & 1) && (n % 3) && (sieve[n / 3]));
    }
    void fill_sieve() {
        sieve.set();
        for (int p = 5, d = 4, i = 1, sqn = int(std::sqrt(N)); p <= sqn;
             p += d = 6 - d, i++) {
            if (!sieve[i]) continue;
            for (int q = p * p / 3, r = d * p / 3 + (d * p % 3 == 2), s = 2 * p,
                     qe = (int)sieve.size();
                 q < qe; q += r = s - r)
                sieve[q] = 0;
        }
    }
    vector<int> get_primes() {
        vector<int> ret{2, 3};
        for (int p = 5, d = 4, i = 1; p <= N; p += d = 6 - d, i++)
            if (sieve[i]) ret.push_back(p);
        while (!ret.empty() && ret.back() > N) ret.pop_back();
        return ret;
    }
    fast_sieve() {
        fill_sieve();
        if (compute_primes) primes = get_primes();
    }
};

Update: Just benchmarked, seems to be 3x faster than the version in the blog.

1802042

so cool! could you suggest any blog on this technique ?

It is just noting that all primes other than $$$2, 3$$$ are $$$\pm 1 \pmod 6$$$, and iterating over only those candidates.

← Rev. 5 →

i like to do this in sieve..
1. vector<char> instead of vector<bool> / bitset

Блог пользователя dimss