I know that counting bits using precomputation is the fastest way to go (over __builtin_popcount()).
For 32 bit integers we can do cnt[x>>16]+cnt[x&65535] by precomputing counts upto (1<<16).
How can we do it for 64 bit integers by precomputing upto (1<<22).








I don't understand, why can't we just do
cnt[(x>>48)&65535]+cnt[(x>>32)&65535]+cnt[(x>>16)&65535]+cnt[x&65535]? or is this too slow? If we precompute upto (1<<22) then it would becnt[(x>>44)&4194303]+cnt[(x>>22)&4194303]+cnt[x&4194303]. Though, I don't know if this is the fastest way.Thank you, just wanted to confirm this.