How to speed up the sieve of Eratosthenes by 1.5 times with one line

#	User	Rating
1	tourist	3985
2	jiangly	3814
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3517
7	Radewoosh	3410
8	hos.lyric	3399
9	ecnerwala	3392
9	Um_nik	3392

#	User	Contrib.
1	cry	169
2	maomao90	162
2	Um_nik	162
4	atcoder_official	161
5	djm03178	158
6	-is-this-fft-	157
7	adamant	155
8	awoo	154
8	Dominater069	154
10	luogu_official	150

There is such an implementation of the sieve of Eratosthenes:

usual

const int N = 1e8;

bool used[N];

void solve() {
    ll sum = 0;
    for (int i = 2; i < N; i++) {
        if (!used[i]) {
            sum += i;
            for (int j = min((ll)INT32_MAX, i * 1ll * i); j < N; j += i)
                used[j] = true;
        }
    }
    cout << sum << "\n";
}

Then replace the second line with bitset used;

bitset

const int N = 1e8;

bitset<N> used;

void solve() {
    ll sum = 0;
    for (int i = 2; i < N; i++) {
        if (!used[i]) {
            sum += i;
            for (int j = min((ll)INT32_MAX, i * 1ll * i); j < N; j += i)
                used[j] = true;
        }
    }
    cout << sum << "\n";
}

For comparison, let 's take another linear sieve of Eratosthenes.

linear

const int N = 1e8;

int le[N], primes[N / 10];

void solve() {
    ll sum = 0;
    int m = 0;
    for (int i = 2; i < N; i++) {
        if (!le[i]) {
            sum += i;
            primes[m++] = i;
            le[i] = i;
        }
        for (int j = 0; j < m && primes[j] <= le[i] && primes[j] * i < N; j++)
            le[primes[j] * i] = primes[j];
    }
    cout << sum << "\n";
}

Running time table on compiler GNU G++20 11.2.0 (64 bit, winlibs) with #pragma GCC optimize("O3") in all three solutions.

\begin{array}{|c|c|c|c|} \hline N & usual & linear & bitset \cr \hline 2e7 & 202ms & 140ms & 78ms \cr \hline 5e7 & 467ms & 374ms & 249ms \cr \hline 1e8 & 1138ms & 732ms & 686ms \cr \hline 2e8 & 2308ms & ML & 1669ms \cr \hline \end{array}

I guess it's because of the cache. The data takes up 8 times less memory and is cached better. The more cache there is on the computer, the more noticeable the acceleration. For example, I have a 3-fold acceleration on $$$N = 1e8$$$ compared to a conventional sieve.

Unfortunately, this is useless in practice, because almost always in the sieve we want to get some more information.

	Rev.	Lang.	By	When	Δ	Comment
	en2		dimss	2023-07-11 08:20:12	3
	en1		dimss	2023-06-30 10:01:47	2238	Initial revision (published)

Rev.

Lang.

When

Comment

en2

dimss

2023-07-11 08:20:12

en1

dimss

2023-06-30 10:01:47

2238

Initial revision (published)

History