Count pairs (A[i] & A[j]) = 0 but Ai <= 1e9

→ Pay attention

Before contest
Educational Codeforces Round 176 (Rated for Div. 2)
35:18:02
Register now »

→ Top rated

#	User	Rating
1	jiangly	3845
2	tourist	3798
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3589
6	Ormlis	3532
7	Benq	3468
8	Radewoosh	3463
9	ecnerwala	3451
10	Um_nik	3450

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	cry	165
2	Qingyu	160
2	-is-this-fft-	160
4	atcoder_official	157
5	Dominater069	156
6	adamant	154
7	djm03178	151
8	luogu_official	149
9	awoo	147
10	TheScrasse	145

View all →

→ Find user

→ Recent actions

Detailed →

Sammmmmmm's blog

Count pairs (A[i] & A[j]) = 0 but Ai <= 1e9

By Sammmmmmm, history, 20 months ago, In English

Given an array of N integers. Count pairs (i, j) so that (A[i] & A[j]) = 0

N <= 1e5;

Ai <= 1e9

Sammmmmmm
20 months ago
28

Comments (27)

Show archived | Write comment?

Sammmmmmm

20 months ago, # |

Auto comment: topic has been updated by Sammmmmmm (previous revision, new revision, compare).

→ Reply

tosivanmak

20 months ago, # |

where is this problem found??? or is it just a problem you created by urself?

→ Reply

Loserinlife

20 months ago, # ^ |

The problem is here https://tleoj.edu.vn/contest/026 (3rd problem)

→ Reply

Halym2007

20 months ago, # |

← Rev. 2 →

-17

This problem can be solvable with trie.

→ Reply

Loserinlife

20 months ago, # ^ |

Can you explain it in more detail? Thanks

→ Reply

Halym2007

20 months ago, # ^ |

-8

Do you know trie?

→ Reply

SuperJ6

20 months ago, # ^ |

I know trie but not how you can solve this problem with it...

→ Reply

chenyifan37

20 months ago, # ^ |

i dont think trie can solve this problem...trie always used in xor problem but not and qwq

→ Reply

SuperJ6

20 months ago, # |

← Rev. 2 →

+46

I like this problem, it has clever trick I've not seen exactly in this form. Assume numbers are < 2^30. The key is noticing that if (a[i] & a[j]) == 0 then at least one of the numbers in a valid pair must have bitcount less <= 15. Also, 30C15 ~ 1e8.

Let's split numbers into two groups, those with bitcount <= 15 in set S and those >= 15 in set T. Then from before, no pairs between elements in T will work, so we only need to count pairs in S and pairs between S and T.

For pairs in S, we are going to consider for each element in S how many other pairs in S intersect with its bitmask and subtract those. To find number of masks that intersect another mask we can use SOS dp.

For pairs between S and T, we are going to consider for each element in T how many elements are there in S only using bits in unset bits of T element. We can find number of elements within mask using SOS dp as well.

Now you may say SOS dp is too slow because it is 30 * 2^30. However, realize in both cases described above we are only iterating over masks with at most 15 bits. This means SOS dp will actually be more like 30 * 30C15 :eyes:. With a bit of constant optimization and luck something along this line will hopefully pass.

Actually I'm not confident this is right approach, it seems it will be difficult to deal with memory, if there is way better solution someone please let me know :pray:.

→ Reply

SuperJ6

20 months ago, # ^ |

← Rev. 2 →

-10

Oh to deal with memory, ig why do SOS dp, for all elements you should be able to iterate over corresponding bitmasks naively. It will then be n * 2^15 and 2^15 memory.

That seems a bit less cool tho :(. Does anyone know problem that forces technique more like I described with SOS dp?

→ Reply

Sammmmmmm

20 months ago, # ^ |

Tysm. That's such a cool solution!

→ Reply

mgch

11 months ago, # ^ |

You can solve it in O(N^2/word) with such constraints, but how to apply SOS dp there no clue..

→ Reply

callback

20 months ago, # |

← Rev. 5 →

-10

Just deleted my comment related to code,

I misunderstood the constraints earlier_

→ Reply

callback

20 months ago, # ^ |

← Rev. 2 →

-18

can use, this function also

long long solve(vector<int> &contain)
{
	if(contain.size()==1)
		return 1LL*contain[0]*contain[0];
 
	vector<int> halved(contain.size()/2);
	for(int i=0;i<contain.size()/2;i++)
		halved[i]=contain[i+contain.size()/2];
	long long ans = -solve(halved);
 
	for(int i=0;i<contain.size()/2;i++)
		halved[i]+=contain[i];
	ans+=solve(halved);
 
	return ans;
}

→ Reply

SuperJ6

20 months ago, # ^ |

Read bounds...

→ Reply

callback

20 months ago, # ^ |

extremely sorry for that, I though it's same as n

Sorry guys

→ Reply

returnA

20 months ago, # |

this is from ongoing contest

→ Reply

Sammmmmmm

20 months ago, # ^ |

No, it's not. Somebody already sent the link.

→ Reply

kanomahoro

20 months ago, # |

by the way，i want to ask how to solve ai & aj = k problem. may be sqrt，but i do not know how.

→ Reply

SuperJ6

20 months ago, # ^ |

← Rev. 3 →

~~The best I know is approach like this problem, but I think it won't fit in memory for these bounds.~~

I misunderstood, for some reason I thought k bits.

→ Reply

LeoPro

20 months ago, # ^ |

+14

What problem? Like in the blog, but bitwise and is k instead of 0? They are the same, that is, k = 0 is the hardest case. Otherwise you need to consider only those a[i], such that a[i] & k == k. Replace them with a[i] & ~k and the problem boils down with zero bitwise and.

→ Reply

Sugar_fan

20 months ago, # |

+32

Brute force in custom invocation (c++20) on codeforces only costs 546ms.

code

#pragma GCC optimize("Ofast,unroll-loops")
#include <bits/stdc++.h>
using namespace std;
constexpr int m = 32;
int f(int *a, int *b) {
  int res = 0;
  for (int i = 0; i < m; i += 1) {
    for (int j = 0; j < m; j += 1) {
      res += (a[i] & b[j]) == 0;
    }
  }
  return res;
}
int main() {
  cin.tie(nullptr)->sync_with_stdio(false);
  int n = 100000;
  vector<int> a(n);
  mt19937_64 mt;
  uniform_int_distribution uid(0, 1000000000);
  for (int &ai : a) {
    ai = uid(mt);
  }
  // int check = 0;
  // for (int i = 0; i < n; i += 1) {
  //   for (int j = i + 1; j < n; j += 1) {
  //     check += (a[i] & a[j]) == 0;
  //   }
  // }
  // cout << check << "\n";
  int ans = 0;
  for (int i = 0; i < n; i += m) {
    for (int j = i + m; j < n; j += m) {
      ans += f(a.data() + i, a.data() + j);
    }
  }
  for (int i = 0; i < n; i += m) {
    for (int j = 0; j < m; j += 1) {
      for (int k = j + 1; k < m; k += 1) {
        ans += (a[i + j] & a[i + k]) == 0;
      }
    }
  }
  cout << ans << "\n";
}

→ Reply

Sugar_fan

20 months ago, # ^ |

I find something weird.

code1

#pragma GCC optimize("Ofast,unroll-loops")
#include <bits/stdc++.h>
using namespace std;
constexpr int m = 1;
int main() {
  cin.tie(nullptr)->sync_with_stdio(false);
  int n = 100000;
  vector<int> a(n);
  mt19937_64 mt(chrono::steady_clock::now()
                    .time_since_epoch()
                    .count());
  uniform_int_distribution uid(0, 1000000000);
  for (int &ai : a) {
    ai = uid(mt);
  }
  int ans = 0;
  for (int i = 0; i < n; i += m) {
    for (int j = i + m; j < n; j += m) {
      ans += (a[i] & a[j]) == 0;
    }
  }
  // for (int i = 0; i < n; i += m) {
  //   for (int j = 0; j < m; j += 1) {
  //     for (int k = j + 1; k < m; k += 1) {
  //       ans += (a[i + j] & a[i + k]) == 0;
  //     }
  //   }
  // }
  cout << ans << "\n";
}

code2

#pragma GCC optimize("Ofast,unroll-loops")
#include <bits/stdc++.h>
using namespace std;
constexpr int m = 1;
int main() {
  cin.tie(nullptr)->sync_with_stdio(false);
  int n = 100000;
  vector<int> a(n);
  mt19937_64 mt(chrono::steady_clock::now()
                    .time_since_epoch()
                    .count());
  uniform_int_distribution uid(0, 1000000000);
  for (int &ai : a) {
    ai = uid(mt);
  }
  int ans = 0;
  for (int i = 0; i < n; i += m) {
    for (int j = i + m; j < n; j += m) {
      ans += (a[i] & a[j]) == 0;
    }
  }
  for (int i = 0; i < n; i += m) {
    for (int j = 0; j < m; j += 1) {
      for (int k = j + 1; k < m; k += 1) {
        ans += (a[i + j] & a[i + k]) == 0;
      }
    }
  }
  cout << ans << "\n";
}

code2 costs 546ms but code1 costs 3416ms on codeforces, while both cost ~750ms on my notebook.

→ Reply

VIRUSGAMING

20 months ago, # ^ |

Why does this code2 not give a timeout if a for is done up to n and then another nested up to n, it would be a total of 1e10 (O(n*n))??

→ Reply

nonrice

11 months ago, # ^ |

See $n\leq 10^5$ which is a kind bound. $10^{10}$ will TLE, but stuff like $10^9$ , $2\cdot 10^9$ , can be accepted too. The checking part is a very simple operation so compiling optimization like vectorization and loop unrolling provided enough boost to push running time under the edge

→ Reply

Sacharlemagne

20 months ago, # ^ |

Im having trouble understanding what's going on in the code.
I'd be grateful if you can explain the approach.
Thanks!

→ Reply

nonrice

11 months ago, # ^ |

In his code1 and code2 he is just doing the brute force for i, for j approach but turns out the commenting happened to trigger/untrigger an optimization leading to run time discrepancy

In the original code he is doing something smart. He splits array into blocks of 32. Then for each pair of blocks, he checks the pairs formed by choosing an element from each. As a result, for each ~32x32 comparisons his array accesses remain very close together which speeds up the program due to cache locality. But turns out this optimization wasn’t needed since the for i for j brute force approach passed as well

→ Reply