On Multidimensional Range Queries

Правка en6, от Laakeri, 2021-01-08 10:11:53

The following question is frequently asked in Codeforces (46390, 45157, 11324): Is there a 2D segment tree that supports range addition and range minimum? In this blog post I give evidence that such a data structure does not exist, or if it did exist it would not generalize to higher dimensions. In particular I show that if for all $$$d$$$ a $$$d$$$-dimensional data structure that performs such queries in $$$O(polylog(N))$$$ time did exist, then the exponential time hypothesis would fail. Such a data structure exists for range addition and range sum, so this is a non-trivial claim separating the hardness of these problems.

Update 2021:

A paper " Algorithms and Hardness for Multidimensional Range Updates and Queries" in ITCS 2021 by Joshua Lau and Angus Ritossa (https://arxiv.org/abs/2101.02003, https://www.youtube.com/watch?v=joB6IaqNqQc) (I guess the authors' cf handles are [user:https://mirror.codeforces.com/profile/Junkbot] and [user:https://mirror.codeforces.com/profile/AngusRitossa]) gives an overview of the complexity of multidimensional range queries with different query and update operations. For range addition and range minimum in 2D they show a lower bound of $$$\Omega(Q^{3/2-o(1)})$$$ under several different conjectures, and there is a known upper bound of $$$O(Q^{3/2})$$$.

Update:

I researched some literature on related topics and found that while lower bounds for this exact data structure have not been given before, there has been a lot of research on related topics. Some observations:

  1. Klee's measure problem is very related to Range-Add-Min structure.
  2. The reduction from Clique to Klee's measure problem in [1] can be adapted to show that $$$d$$$-dimensional Range-Add-Min-structure with $$$O(polylog(N))$$$ queries could solve $$$d+1$$$-Clique in $$$O(n^2)$$$ time.
  3. Therefore $$$d$$$-dimensional Range-Add-Min-structure for all $$$d$$$ would mean FPT=W[1]. Note that ETH implies FPT!=W[1].
  4. Also, 2D Range-Add-Min structure could solve 3-clique in $$$O(n^2)$$$ time, which would be a breaktrough result.
  5. Hardness of very related problems has been considered earlier this year in [2]. I'm not sure if their results directly imply hardness of 2D Range-Add-Min structure.
  6. There is also a connection to MAX-2-CSP [3], which can be extended to connection to MAX-d-CSP for $$$d$$$-dimensional Range-Add-Min-structure.

[1] T. M. Chan. A (slightly) faster algorithm for klee's measure problem. SCG 2008 https://dl.acm.org/citation.cfm?id=1377693

[2] L. Duraj, K. Kleiner, A. Polak, V. V. Williams. Equivalences between triangle and range query problems. arXiv 2019 https://arxiv.org/abs/1908.11819

[3] R. Williams. A new algorithm for optimal 2-constraint satisfaction and its implications. TCS 2005 https://www.sciencedirect.com/science/article/pii/S0304397505005438

A pdf version of this blog is in http://laakeri.kapsi.fi/a/rmq.pdf

Formalization

Consider a $$$d$$$-dimensional integer array $$$A$$$ whose elements are indexed with $$$A[(i_1, \ldots, i_d)]$$$ for all $$$0 \le i_j < N$$$. Furthermore assume that $$$A$$$ is initialized to zero. The operation $$$add(v, l_1, r_1, \ldots, l_d, r_d)$$$ adds the value $$$v$$$ to all elements $$$A[(i_1, \ldots, i_d)]$$$ such that $$$l_j \le i_j < r_j$$$. The operation $$$min(l_1, r_1, \ldots, l_d, r_d)$$$ returns the minimum value over all elements $$$A[(i_1, \ldots, i_d)]$$$ such that $$$l_j \le i_j < r_j$$$. A data structure that supports these two operations is called Range-Add-Min-structure.

In the Boolean satisfiability problem (SAT) the task is to decide if there is an assigment of $$$n$$$ binary variables to true/false so that it satisfies a given Boolean formula. The formula consists of $$$m$$$ clauses, and an assigment of variables satisfies the formula if it satisfies every clause. A clause is a logical or that consists of variables and negations of variables. For example if $$$x_1, x_2, \ldots, x_n$$$ are variables, the clause $$$(x_1 \vee \neg x_3 \vee x_6)$$$ is satisfied if $$$x_1$$$ is true or $$$x_3$$$ is false or $$$x_6$$$ is true.

The problem $$$k$$$-SAT is Boolean satisfiability problem where each clause has at most $$$k$$$ variables or negations of variables. $$$2$$$-SAT can be solved in polynomial time, but $$$3$$$-SAT is NP-hard. Furthermore a widely believed exponential time hypothesis conjectures that there exists $$$\varepsilon > 0$$$ such that $$$3$$$-SAT cannot be solved in time $$$O((1+\varepsilon)^n poly(n, m))$$$, where $$$poly(n, m)$$$ is polynomial function on $$$n$$$ and $$$m$$$.

Theorem 1: If there exists a function $$$f(d)$$$ such that $$$d$$$-dimensional Range-Add-Min-structure can process $$$Q$$$ operations in $$$O((\log_2(N) + \log_2(Q))^{f(d)} Q)$$$ time for all $$$d$$$, then exponential time hypothesis fails.

Proof

We prove Theorem 1 by assuming that such $$$d$$$-dimensional data structure exists, and constructing a $$$O(2^{3n/d} poly(n,m))$$$ time algorithm for 3-SAT, where $$$n$$$ is the number of variables and $$$m$$$ is the number of clauses.

Claim 1: For all $$$k$$$ and $$$d$$$, $$$k$$$-SAT can be solved with $$$O(2^{kn/d} m)$$$ operations with a $$$d$$$-dimensional Range-Add-Min-structure with $$$N=2^{n/d}$$$.

Proof: Suppose without loss of generality that $$$d$$$ divides $$$n$$$. Let $$$N = 2^{n/d}$$$. We use the $$$d$$$-dimensional array $$$A$$$ with indices $$$(i_1, \ldots, i_d)$$$, $$$0 \le i_j < N$$$. $$$A$$$ has $$$2^n$$$ elements, because $$$N^d = (2^{n/d})^d = 2^n$$$.

Let $$$x_0, \ldots, x_{n-1}$$$ be a bit string of length $$$n$$$. The bijection $$$\phi$$$ with

$$$\phi(x_0, \ldots, x_{n-1}) = (1 x_0 + 2 x_1 + 4 x_2 + \ldots + x_{n/d-1} 2^{n/d - 1}, \ldots, 1 x_{n-n/d} + \ldots)$$$

maps bit strings of length $$$n$$$ to the indices of $$$A$$$. For example if $$$n=6$$$ and $$$d=2$$$, the bijection is $$$\phi(x_0, x_1, x_2, x_3, x_4, x_5) = (1 x_0 + 2 x_1 + 4 x_2, 1 x_3 + 2 x_4 + 4 x_5)$$$.

An assignment of variables in SAT is a bit string of length $$$n$$$. We use the array $$$A$$$ to represent if the assigment $$$x_0, x_1, \ldots, x_{n-1}$$$ is a satisfiable assignment of the SAT formula, using the bijection $$$\phi(x_0, \ldots, x_{n-1})$$$ to map between variable assignments and indices of $$$A$$$. We iteratively add clauses to the formula, maintaining the invariant that $$$A[\phi(x_0, \ldots, x_{n-1})] = 0$$$ if and only if the assigment $$$x_0, \ldots, x_{n-1}$$$ satisfies all clauses added so far.

Claim 2: Given a clause that contains $$$k$$$ variables or negations of variables, we can add $$$1$$$ to all elements $$$A[\phi(x_0, \ldots, x_{n-1})]$$$ such that $$$x_0, \ldots, x_{n-1}$$$ does not satisfy the clause using at most $$$2^{kn/d}$$$ $$$add$$$-operations.

Proof: Let $$$v_1, \ldots, v_k$$$ be the variables that occur in the clause (possibly with negations). Only the values $$$x_{v_1}, \ldots, x_{v_k}$$$ affect whether the clause is satisfied or not. These values affect at most $$$k$$$ dimensions in the mapping $$$\phi(x_0, \ldots, x_{n-1})$$$. Lets call these dimensions the non-trivial dimensions, and other dimensions the trivial dimensions. If we choose the values of non-trivial dimensions, then by the inverse mapping $$$\phi^{-1}$$$ we choose the values of $$$x_{v_1}, \ldots, x_{v_k}$$$, and know if the clause is satisfied or not. We brute force over all possible values of non-trivial dimensions. For each value, if the clause is not be satisfied by $$$\phi^{-1}$$$, we use operation $$$add(1, l_1, r_1, \ldots, l_d, r_d)$$$, where $$$l_j = r_j - 1$$$ is the brute forced value of dimension $$$j$$$ if $$$j$$$ is a non-trivial dimension and otherwise $$$l_j = 0$$$ and $$$r_j = N$$$. Each dimension has $$$N$$$ values, so brute forcing the non-trivial dimensions takes at most $$$N^k = (2^{n/d})^k = 2^{kn/d}$$$ $$$add$$$-operations. $$$\square$$$

By Claim 2 each clause can be processed with $$$2^{kn/d}$$$ $$$add$$$-operations. Therefore when adding all $$$m$$$ clauses we use at most $$$m 2^{kn/d}$$$ $$$add$$$-operations. After all clauses have been added, we use a single $$$min$$$-operation where $$$l_j = 0$$$ and $$$r_j = N$$$ for all dimensions $$$j$$$ to query if there are any elements in $$$A$$$ that have value $$$0$$$, and thus correspond to satisfiable assignments. $$$\square$$$

Claim 3: If the data structure of Theorem 1 exists, $$$k$$$-SAT can be solved in $$$O((1+\varepsilon)^n poly(n, m))$$$ for any $$$\varepsilon > 0$$$.

Proof: Given $$$k$$$ and $$$\varepsilon$$$, lets choose $$$d$$$ so that $$$2^{k/d} < 1 + \varepsilon$$$. By using the algorithm of Claim 1 with the data structure of Theorem 1 we obtain a runtime of

$$$O((\log_2(2^{n/d}) + \log_2(2^{kn/d} m))^{f(d)} 2^{kn/d} m)$$$
$$$= O((n/d + kn \log_2(m)/d)^{f(d)} 2^{kn/d} m)$$$
$$$= O((1 + \varepsilon)^n poly(n, m)).\square$$$

Claim 3 completes the proof, because the exponential time hypothesis states that there exists $$$\varepsilon > 0$$$ such that no $$$O((1 + \varepsilon)^n poly(n, m))$$$ algorithm exists for 3-SAT.

Conclusion

We proved that efficient $$$d$$$-dimensional Range-Add-Min-structure does not exist if the exponential time hypothesis holds. Our proof allows the existance of such 2D data structure. However we know that such 2D data structure must be developed with techniques that do not generalize to higher dimensions. An 8-dimensional Range-Add-Min-structure would imply a state-of-the-art algorithm for 3-SAT trough our reduction. Some open questions are:

  1. Is there a conditional hardness proof for 2D case?

  2. Likewise to $$$(min, +)$$$-semiring, our reduction also works for $$$(+, \cdot)$$$-semiring. Does it work for all commutative semirings?

  3. A $$$d$$$-dimensional data structure with $$$O(polylog(N))$$$ operations exists if the operations are addition and sum. Can we classify all pairs of operations so that the data structure either exists, or does not exist assuming ETH?

  4. Is it plausible that techniques for designing multidimensional data structures work for $$$d=2$$$ but not for some higher $$$d$$$?

Теги #range query, #data structure

История

 
 
 
 
Правки
 
 
  Rev. Язык Кто Когда Δ Комментарий
en7 Английский Laakeri 2021-01-08 10:17:25 134 (published)
en6 Английский Laakeri 2021-01-08 10:11:53 743 (saved to drafts)
en5 Английский Laakeri 2019-11-17 22:35:31 1663 Added update about literature on this topic.
en4 Английский Laakeri 2019-10-31 01:41:49 16
en3 Английский Laakeri 2019-10-31 01:31:32 2
en2 Английский Laakeri 2019-10-31 00:58:52 192
en1 Английский Laakeri 2019-10-31 00:56:03 7516 Initial revision (published)