Hi everyone!↵
↵
Here I want to share a method to count dominating pairs in high dimensions, while the number of points ($n$) is small but the dimensionality ($k$) is large (like $n = 32768, k = 8$).↵
↵
The problem of counting dominating pairs in K-Dimension is as follows:↵
↵
> For two $k$-tuples $a = (a_0, a_1, \dots, a_{k - 1})$ and $b = (b_0, b_1, \dots, b_{k - 1})$, define $dom(a, b) = \prod\limits_{i = 0}^{k - 1}[a_i \geq b_i]$, which outputs $1$ if each $a_i$ is greater than or equal to $b_i$ ($a$ dominates $b$), otherwise $0$.↵
>↵
> Given $n$ $k$-tuples $t_0, t_1, \dots, t_{n - 1}$, for each $i \in [0, n)$, compute $g(i) = \sum\limits_{j = 0}^{n - 1} dom(t_i, t_j)$.↵
↵
When $k = 2$, the problem simplifies to counting inversions, solvable directly using a Fenwick Tree.↵
↵
For $k = 3$, one can consider a Divide and Conquer (D&C) approach, achieving a complexity of $O(n \log^2 n)$, which is excellent. Alternatively, a K-D Tree approach achieves $O(n \sqrt{n})$ complexity, also a good option.↵
↵
For $k = 4$, D&C embedded with D&C or just K-D Tree solutions are less optimal, with reduced complexity benefits.↵
↵
At $k = 5$, the Divide and Conquer approach is notably less efficient than brute force, and the K-D Tree offers no significant advantage over brute force.↵
↵
An optimized brute force approach involves using bitset to enhance efficiency.↵
↵
## Brute Force Approach↵
↵
An obvious brute force approach is to enumerate $i, j$ and check whether $p_i$ dominates $p_j$ in each dimension in $O(k)$ time. The overall complexity is $O(n^2 k)$.↵
↵
A key property is that $dom[(a_{0..i}), (b_{0..i})] = 1$ implies $dom[(a_{0..i + 1}), (b_{0..i + 1})] = 1$. Therefore, the enumeration order can be swapped. First, enumerating dimensions (tuple indices) $d$, maintaining the dominant relation under dimension $d$, and computing the relation up to the next dimension based on the current order:↵
↵
Let $b_{d, i, j} =fdom(p_{i, 0..d - 1}, p_{j, 0..d - 1})$. Then:↵
↵
$$↵
b_{d + 1, i, j} \gets b_{d, i, j} \cdot [p_{i, d} \geq p_{j, d}]↵
$$↵
↵
In practice, dimension $d$ can be omitted because each $(i, j)$ pair is independent, simplifying to:↵
↵
$$↵
b_{i, j} \gets b_{i, j} \cdot [p_{i, d} \geq p_{j, d}]↵
$$↵
↵
The total complexity is $O(n^2 k)$.↵
↵
## Bitset Optimization↵
↵
It's observed that this problem can be optimized using bitset. Utilize a 2D array `bitset<N> b[N]`. Considering the transition condition $[p_{i, d} \geq p_{j, d}]$, sort points by $p_{i, d}$ to obtain $q$, where $q_i$ is located at $p$ index $l_i$.↵
↵
Enumerating from $0$ to $n - 1$, deduce a relation array $r$, where $r_j$ indicates $p_{l_j, d} \leq q_{i, d}$. Then update $b_{l_i}$ using $b_{l_i} \gets \left(b_{l_i} \text{ bitand } r\right)$.↵
↵
Time complexity is $O(\frac{n^2 k}{w})$, space complexity is $O(\frac{n^2}{z})$, where $w = 64, z = 8$, with excellent constant factors.↵
↵
### Space Optimization↵
↵
If space constraints are strict, use grouping.↵
↵
Divide $S$ (where $S \geq w$) points $p_{l..l + S - 1}$ into groups.↵
Each time, focus only on the values of $b_{l} \dots b_{l + S - 1}$.↵
Analyzing the complexity:↵
↵
+ Sorting takes $O(nk\log n)$ time;↵
+ Calculating $r$ for each group is $O(nk)$,↵
thus the total complexity is $O(\frac{n^2k}{S})$. Thus, without space optimization,↵
calculating $r$ has a complexity of $O(nk)$;↵
+ Calculating the array $b$ for each group takes $O(\frac{Snk}{w})$,↵
resulting in a total complexity of $O(\frac{n^2k}{w})$.↵
↵
Since $S \geq w$, the time complexity is $O(\frac{n^2k}{w})$,↵
and the space complexity is $O(\frac{Sn}{z})$.↵
↵
<spoiler summary="Implementation (C++)">↵
Note that this implementation is not for CF1826E but for the pure counting problem above.↵
↵
```cpp↵
#include <iostream>↵
#include <cstdio>↵
#include <vector>↵
#include <cstring>↵
#include <algorithm>↵
#include <bitset>↵
#include <cmath>↵
using namespace std;↵
↵
using i32 = int;↵
using i64 = long long;↵
using u32 = unsigned int;↵
using u64 = unsigned long long;↵
↵
template<typename T> using vec = vector<T>;↵
↵
signed main() ↵
{↵
u32 k, n;↵
cin >> k >> n;↵
// p[k][i].first = p[i][k], p[k][i].second = l[i]↵
vec<vec<pair<u32, u32>>> p(k, vec<pair<u32, u32>>(n));↵
for (u32 i = 0; i < n; i++) {↵
for (u32 d = 0; d < k; d++) {↵
cin >> p[d][i].first;↵
p[d][i].second = i;↵
}↵
}↵
for (u32 d = 0; d < k; d++) sort(p[d].begin(), p[d].end());↵
vec<u32> ans(n);↵
// S mentioned above.↵
// Note that larger bs, less time cost, but more space cost.↵
const u32 bs = ceil(sqrt(n)) * 16;↵
for (u32 i = 0; i * bs < n; i++) {↵
u32 l = i * bs, r = min(n, i * bs + bs);↵
// '80000' means n is no more than 80000.↵
vec<bitset<80000>> b(r - l);↵
for (auto &c: b) c.set();↵
for (u32 d = 0; d < k; d++) {↵
bitset<80000> s; ///< relationship array mentioned above↵
for (u32 j = 0, it = 0; j < n; j++) {↵
auto [v, q] = p[d][j];↵
while (it < n && p[d][it].first <= v) s.set(p[d][it++].second);↵
if (l <= q && q < r) b[q - l] &= s;↵
}↵
}↵
for (u32 j = l; j < r; j++) ans[j] = b[j - l].count();↵
}↵
for (u32 i = 0; i < n; i++) cout << ans[i] << '\n';↵
return 0;↵
}↵
```↵
</spoiler>↵
↵
Related problem: [CF1826E](https://mirror.codeforces.com/contest/1826/problem/E).↵
↵
There is also a zh-cn version of this post. If you have any interest, try this [link](https://www.luogu.com/article/zcm5x8zj).
↵
Here I want to share a method to count dominating pairs in high dimensions, while the number of points ($n$) is small but the dimensionality ($k$) is large (like $n = 32768, k = 8$).↵
↵
The problem of counting dominating pairs in K-Dimension is as follows:↵
↵
> For two $k$-tuples $a = (a_0, a_1, \dots, a_{k - 1})$ and $b = (b_0, b_1, \dots, b_{k - 1})$, define $dom(a, b) = \prod\limits_{i = 0}^{k - 1}[a_i \geq b_i]$, which outputs $1$ if each $a_i$ is greater than or equal to $b_i$ ($a$ dominates $b$), otherwise $0$.↵
>↵
> Given $n$ $k$-tuples $t_0, t_1, \dots, t_{n - 1}$, for each $i \in [0, n)$, compute $g(i) = \sum\limits_{j = 0}^{n - 1} dom(t_i, t_j)$.↵
↵
When $k = 2$, the problem simplifies to counting inversions, solvable directly using a Fenwick Tree.↵
↵
For $k = 3$, one can consider a Divide and Conquer (D&C) approach, achieving a complexity of $O(n \log^2 n)$, which is excellent. Alternatively, a K-D Tree approach achieves $O(n \sqrt{n})$ complexity, also a good option.↵
↵
For $k = 4$, D&C embedded with D&C or just K-D Tree solutions are less optimal, with reduced complexity benefits.↵
↵
At $k = 5$, the Divide and Conquer approach is notably less efficient than brute force, and the K-D Tree offers no significant advantage over brute force.↵
↵
An optimized brute force approach involves using bitset to enhance efficiency.↵
↵
## Brute Force Approach↵
↵
An obvious brute force approach is to enumerate $i, j$ and check whether $p_i$ dominates $p_j$ in each dimension in $O(k)$ time. The overall complexity is $O(n^2 k)$.↵
↵
A key property is that $dom[(a_{0..i}), (b_{0..i})] = 1$ implies $dom[(a_{0..i + 1}), (b_{0..i + 1})] = 1$. Therefore, the enumeration order can be swapped. First, enumerating dimensions (tuple indices) $d$, maintaining the dominant relation under dimension $d$, and computing the relation up to the next dimension based on the current order:↵
↵
Let $b_{d, i, j} =
↵
$$↵
b_{d + 1, i, j} \gets b_{d, i, j} \cdot [p_{i, d} \geq p_{j, d}]↵
$$↵
↵
In practice, dimension $d$ can be omitted because each $(i, j)$ pair is independent, simplifying to:↵
↵
$$↵
b_{i, j} \gets b_{i, j} \cdot [p_{i, d} \geq p_{j, d}]↵
$$↵
↵
The total complexity is $O(n^2 k)$.↵
↵
## Bitset Optimization↵
↵
It's observed that this problem can be optimized using bitset. Utilize a 2D array `bitset<N> b[N]`. Considering the transition condition $[p_{i, d} \geq p_{j, d}]$, sort points by $p_{i, d}$ to obtain $q$, where $q_i$ is located at $p$ index $l_i$.↵
↵
Enumerating from $0$ to $n - 1$, deduce a relation array $r$, where $r_j$ indicates $p_{l_j, d} \leq q_{i, d}$. Then update $b_{l_i}$ using $b_{l_i} \gets \left(b_{l_i} \text{ bitand } r\right)$.↵
↵
Time complexity is $O(\frac{n^2 k}{w})$, space complexity is $O(\frac{n^2}{z})$, where $w = 64, z = 8$, with excellent constant factors.↵
↵
### Space Optimization↵
↵
If space constraints are strict, use grouping.↵
↵
Divide $S$ (where $S \geq w$) points $p_{l..l + S - 1}$ into groups.↵
Each time, focus only on the values of $b_{l} \dots b_{l + S - 1}$.↵
Analyzing the complexity:↵
↵
+ Sorting takes $O(nk\log n)$ time;↵
+ Calculating $r$ for each group is $O(nk)$,↵
thus the total complexity is $O(\frac{n^2k}{S})$. Thus, without space optimization,↵
calculating $r$ has a complexity of $O(nk)$;↵
+ Calculating the array $b$ for each group takes $O(\frac{Snk}{w})$,↵
resulting in a total complexity of $O(\frac{n^2k}{w})$.↵
↵
Since $S \geq w$, the time complexity is $O(\frac{n^2k}{w})$,↵
and the space complexity is $O(\frac{Sn}{z})$.↵
↵
<spoiler summary="Implementation (C++)">↵
Note that this implementation is not for CF1826E but for the pure counting problem above.↵
↵
```cpp↵
#include <iostream>↵
#include <cstdio>↵
#include <vector>↵
#include <cstring>↵
#include <algorithm>↵
#include <bitset>↵
#include <cmath>↵
using namespace std;↵
↵
using i32 = int;↵
using i64 = long long;↵
using u32 = unsigned int;↵
using u64 = unsigned long long;↵
↵
template<typename T> using vec = vector<T>;↵
↵
signed main() ↵
{↵
u32 k, n;↵
cin >> k >> n;↵
// p[k][i].first = p[i][k], p[k][i].second = l[i]↵
vec<vec<pair<u32, u32>>> p(k, vec<pair<u32, u32>>(n));↵
for (u32 i = 0; i < n; i++) {↵
for (u32 d = 0; d < k; d++) {↵
cin >> p[d][i].first;↵
p[d][i].second = i;↵
}↵
}↵
for (u32 d = 0; d < k; d++) sort(p[d].begin(), p[d].end());↵
vec<u32> ans(n);↵
// S mentioned above.↵
// Note that larger bs, less time cost, but more space cost.↵
const u32 bs = ceil(sqrt(n)) * 16;↵
for (u32 i = 0; i * bs < n; i++) {↵
u32 l = i * bs, r = min(n, i * bs + bs);↵
// '80000' means n is no more than 80000.↵
vec<bitset<80000>> b(r - l);↵
for (auto &c: b) c.set();↵
for (u32 d = 0; d < k; d++) {↵
bitset<80000> s; ///< relationship array mentioned above↵
for (u32 j = 0, it = 0; j < n; j++) {↵
auto [v, q] = p[d][j];↵
while (it < n && p[d][it].first <= v) s.set(p[d][it++].second);↵
if (l <= q && q < r) b[q - l] &= s;↵
}↵
}↵
for (u32 j = l; j < r; j++) ans[j] = b[j - l].count();↵
}↵
for (u32 i = 0; i < n; i++) cout << ans[i] << '\n';↵
return 0;↵
}↵
```↵
</spoiler>↵
↵
Related problem: [CF1826E](https://mirror.codeforces.com/contest/1826/problem/E).↵
↵
There is also a zh-cn version of this post. If you have any interest, try this [link](https://www.luogu.com/article/zcm5x8zj).