Hi,
some of you may know the implementation of a multidimensional Fenwick Tree proposed by mouse_wireless in the Blog Post Nifty implementation of multi-dimensional Binary Indexed Trees using templates.. If not, definitely check it out, it's really elegant. I want to show a similar implementation that also allows Range Updates in an arbitrary number of dimensions, based on the approach for Range Updates with a Fenwick Tree in one dimension. You should be familiar with the implementation of mouse_wireless and Range Update Range Query on a Fenwick Tree in one dimension before reading on. The idea behind the approach I will show was mentioned in this paper. However, I didn't find a simple implementation for it in the Internet.
So, the n-dimensional Range Update Range Query Fenwick Tree is recursively defined as follows. In the case n = 0, it's just a number. Else we create the two Fenwick Trees necessary for Range Update Range Query, like in the one-dimensional case. But instead of numbers we have (n — 1)-dimensional Range Update Range Query Fenwick Trees as leafs. Range Updates work exactly the same way in n dimensions as they do in one dimension — the only difference is that we don't increment some number but rather perform a Range Update on a (n — 1)-dimensional Fenwick Tree (except, of course, n = 0, then we just increment the number in the zero-dimensional Tree). The same goes for Range Sums — while iterating over the nodes that are part of the queried range, we do a Range Query on the (n — 1)-dimensional Fenwick Tree instead of incrementing the final result by a number.
As a reminder, the two Fenwick Trees in a Range Update Range Query Fenwick Tree have the following purpose. We will only consider incrementing suffixes, since then we can also increment ranges by the same trick we can query ranges using only prefix sums. In the first tree, you increment the value at the start of the suffix by x (where x is the amount to be added to the suffix). In the second tree, you increment by x * i, where i is the zero-based start of the suffix. Then, when querying for a prefix, the sum from the first tree * (i + 1) is added, where i is the end of the queried prefix, and the sum from the second tree subtracted. It's not hard to see that this indeed includes the correct amount from any updated suffix.
Translating this into C++, the code looks as follows. I index everything zero-based.
template <typename T, int64_t... Ns>
struct FenwickTreeRURQ
{
T t = 0;
void increment(T x) { t += x; }
T rangeSum() { return t; }
};
template <typename T, int64_t N, int64_t... Ns>
struct FenwickTreeRURQ<T, N, Ns...>
{
FenwickTreeRURQ<T, Ns...> t[2][N];
template <typename... Args>
void suffixIncrement(T x, int64_t i, Args... args)
{
int64_t k = i + 1;
while (k <= N)
{
t[0][k - 1].increment(x, args...);
t[1][k - 1].increment(x * (T)i, args...);
k += k & -k;
}
}
template <typename... Args>
void increment(T x, int64_t i, int64_t j, Args... args)
{
suffixIncrement(x, i, args...);
suffixIncrement(-x, j + 1, args...);
}
template <typename... Args>
T prefixSum(int64_t i, Args... args)
{
T x = 0;
int64_t k = i + 1;
while (k)
{
x += (T)(i + 1) * t[0][k - 1].rangeSum(args...);
x -= t[1][k - 1].rangeSum(args...);
k -= k & -k;
}
return x;
}
template <typename... Args>
T rangeSum(int64_t i, int64_t j, Args... args)
{
return prefixSum(j, args...) - (i ? prefixSum(i - 1, args...) : 0);
}
};
As shown in the paper mentioned above, the time complexity for Range Updates and Range Queries is O(4^d log n_1 * log n_2 * ... * log n_d), where d is the number of dimensions and n_i (1 <= i <= d) the size of the i-th dimension. The space needed is O(2^d * n_1 * n_2 * ... * n_d). The rather large amount of space needed when d is high is a drawback of this approach, but when the number of dimensions is rather low (say up to 4 or 5), it isn't a problem. In terms of constant factors, the implementation is very efficient both in space and time. The space needed is exactly 2^d * n_1 * n_2 * ... * n_d times sizeof(T), the size of the underlying data type. (The Ns are compile-time-constants and are thus likely to be inlined by the compiler, so they don't contribute to the amount of memory needed, but instead to the code size.) Also don't worry about a large overhead due to function calls. Once the compiler is done with template specialization, it will probably inline most of the functions since they are rather short, especially when the number of dimensions is rather low.
I verified the correctness of this implementation by also doing the Range Updates and Queries on a multidimensional Array by Brute Force and then comparing the results of the Range Queries.







