Why you may want to use two-layer dp not only to decrease the memory usage

#	User	Rating
1	Benq	3792
2	VivaciousAubergine	3647
3	jiangly	3631
4	Kevin114514	3574
5	maroonrk	3521
6	strapple	3515
7	Radewoosh	3461
8	tourist	3428
9	turmax	3378
10	Um_nik	3376

#	User	Contrib.
1	Qingyu	162
2	adamant	148
3	Um_nik	146
4	Dominater069	143
5	errorgorn	141
6	cry	138
7	Proof_by_QED	136
8	YuukiS	135
9	chromate00	134
10	soullless	133

I was upsolving AtCoder Beginner Contest 195 problem F, aka "Coprime Present" where you need to count subsets (of a special set of size < 72) with pairwise coprime elements.

My solution is a bitmask dp over prime factors that were used and a prefix of the set. My first submission.

Fairly straightforward, nothing smart, $$$O(N \cdot 2^n)$$$ time and memory. However, I was disappointed by the memory usage of 600+ Mb. Therefore, I implemented a two-layer version of the same dp. Here is my second submission.

The difference is minimal, I just take layer_number & 1, and don't forget to clear the previous layer before proceeding to the next one

Obviously, it uses way less memory, only 30 Mb. What surprised me is that the running time has also decreased, from 472 ms to just 147 ms, for an improvement by a factor of 3!

I have two hypotheses on why this happens:

two layers fit into the better cache level, facilitating memory lookups;
memory allocation itself takes some time (there must be a reason why people write custom allocators, right?).

Can someone more experienced please tell me what actually happens there?

Rev.	By	When	Δ	Comment
en3	nika-skybytska	2021-06-29 11:58:54	0	(published)
en2	nika-skybytska	2021-06-29 11:53:42	8	Tiny change: 'ts (of a set of si' -> 'ts (of a special set of si' (saved to drafts)
en1	nika-skybytska	2021-06-29 11:50:44	1383	Initial revision (published)

Rev.

Lang.

When

Comment

en3