Ever since I read this blog, I have been curious to see how other space-filling curves other than Hilbert can be used to reduce the run time. In this blog, we will see how Peano curves can help bring down the run time of Mo's algorithm-based solutions.
Prerequisites: Mo's algorithm, Mo's algorithm using Hilbert Curve order
Relation to TSP
In Mo's algorithm, we try to come up with a comparator that can help us sort the queries in such a way that minimizes the total movement of L and R pointers. In other words, if we have $$$Q$$$ queries, each of the form $$$l_i$$$, $$$r_i$$$, then we wish to find such an arrangement of the queries that minimizes the following summation:
$$$S = \displaystyle\sum_{i=1}^{Q-1} |l_i - l_{i+1}| + |r_i - r_{i+1}|$$$.
Each query $$$(l,r)$$$ can be viewed as a coordinate on a 2D plane. We want to visit each of these points such that the travelled distance (with Manhattan distance as the distance metric) is minimized. This problem is the same as Traveling Salesman Problem (TSP), but a variant in which the salesman does not need to return to the starting city / point.
This problem is NP-Hard, taking exponential time to find the best minimum cost. However, we can trade-off time with accuracy. We can find a good enough solution which takes polynomial time and is fast enough. This is what space-filling curve based heuristic solutions help us achieve. Since the summation minimization problem is the same as TSP, we can apply the same heuristic approaches to Mo's algorithm. Let's try to find a new comparator based on all this information.
New Comparator
A comparator that uses Hilbert curve order has already been explained in this blog quite nicely. Here I will discuss a comparator that uses Peano curve order.
Let's build a Peano curve on a $$$3^k × 3^k$$$ matrix and visit all the cells on the matrix according to this curve. Denote ord(i, j, k) as the number of cells visited before the cell (i, j) in order of Peano curve on the $$$3^k × 3^k$$$ matrix. We sort the queries in non-descending order w.r.t. their value of ord(i,j,k).
Butz gives an algorithm for computing the Peano space-filling curve in terms of the base-3 representation of coordinates. It generalizes pretty well with higher dimensions. I will describe the algorithm briefly.
- List down the coordinates in base 3 representation. Each coordinate takes up k places in the base 3 / ternary representation. Here, we choose k such that k satisfies $$$3^k \geq N$$$ Each row is a coordinate, taking up k places to write in ternary form. Let this matrix formed be denoted by $$$a$$$