Just today, I came up with this trick while solving a problem that (originally) required a merge-sort tree. While I knew that it could be solved by a merge-sort tree, I thought implementing it would be relatively tedious. This being said, I do not mean to say that I do not like the data structure, I think its concepts are very clever. However, I do like algorithms whose implementation is very elegant. For this reason, I prefer the fenwick tree over the segment tree on many problems. This was another case of me considering how a fenwick tree could replace the usual implementation. And just today the idea came to me, which turned into a method which could almost replace merge-sort trees. (Note that some people may have found this earlier than I did, and I think most people who did would have found this independently like I did too)

So, as a disclaimer before explaining the method, I should tell you that this method is not really better (time complexity-wise) than the merge-sort tree. However, in practice, many queries of the merge-sort tree requires an $$$O(\log^2 N)$$$ time complexity per query. So, if the situation of $$$N \gg Q$$$ (meaning that $$$N$$$ is much larger than $$$Q$$$, a situation which does not happen very often in the CP scene) does not happen, this method seems feasible enough.

This methodology consists of three steps, "add", "build", and "query". The "add" and "query" step's implementation is a modification to the fenwick tree, and the only added step is "build", which is not complex at all. (You could even one-line it!) I will explain them in order.

**First Step: Add**

This step consists of adding the element to the indices that needs to store this element. For example, if we need to add the element $$$2$$$ to index $$$1$$$, we add this to indices $$$[1,2,4,8,\cdots,2^{\lfloor \log N \rfloor -1}]$$$. This can be done by replacing the operation on the "add" step of the fenwick tree to a `push_back`

operation. (assuming that `bit`

is an array of vectors.) Code is as follows.

```
void add(int i,ll x)
{
while(i<MAXN)
{
bit[i].push_back(x);
i+=i&-i;
}
}
```

The time complexity of this step is $$$O(\log N)$$$ per addition, as there are $$$\log N$$$ indices at maximum that we need to add the element, therefore $$$O(N \log N)$$$ in total.

**Second Step: Build**

Now after the "add" step, each index of the fenwick tree contains the elements of the indices it needs to manage. However, the vectors in the fenwick trees are not sorted yet. In this state, we cannot get an accurate answer from the queries. Therefore, we need to sort every vector in the fenwick tree. The code of this step is pretty straightforward. Code is as follows.

```
void build()
{
for(int i=0;i<MAXN;i++)sort(begin(bit[i]),end(bit[i]));
}
```

Now for the time complexity. The time complexity of this step is not so trivial, but we can prove an upper bound for it. First we need to prove this.

This can be proven through the following.

Now given that we have at most $$$N \log N$$$ values in the entire fenwick tree, the time complexity will be at most:

Therefore this step's time complexity has an upper bound of $$$O(N \log^2 N)$$$. This bound is not very tight, and while I am convinced that a tighter upper bound can be proven, I was unable to do it myself. (Maybe I could prove $$$O(N \log N \log \log N)$$$?) Please help me find a tighter upper bound if you can.

UPD: If you sort the queries before you actually insert the elements, you can omit this step and get a more optimal time complexity, $$$O(N \log N)$$$, on building the fenwick tree. Credits go to darkkcyan for finding out this technique, you can read this blog post to learn more.

**Third Step: Query**

Now that all indices of the fenwick tree are sorted, we can answer the queries. This step bases on the usual implementation of fenwick trees, and therefore finds the answer for a prefix $$$[0,x]$$$. Therefore, we can find the answer of many types of queries on the merge sort tree with $$$[l,r]=[0,r] - [0,l-1]$$$. Note that there may be types of queries that cannot be done like this, so this method does not completely replace the merge-sort tree.

Now for the example. Say, the query we want to answer is as follows.

$$$l \, r \, x$$$: Find the number of elements greater than $$$x$$$ in the interval $$$[l,r]$$$.

We can answer this query on a prefix $$$[0,x]$$$ by adding up answers making up the prefix. Therefore, the time complexity for answering about this prefix is $$$O(\log^2 N)$$$ per query, as there are $$$O(\log N)$$$ intervals, and we need to binary search on each partial interval. We can answer any interval by subtracting the answer for $$$[0,l-1]$$$ from that of $$$[0,r]$$$. Code is as follows.

```
ll query(int i,ll x)
{
ll ans=0;
while(i)
{
ans+=end(bit[i])-upper_bound(begin(bit[i]),end(bit[i]),x);
i-=i&-i;
}
return ans;
}
```

With this discovery, I have found that fenwick trees are much more powerful than I used to think. I think this usage of fenwick trees can replace merge-sort trees in many applications of it, and so I decided to give it a name: the Sort-Fenwick. The name comes almost from the merge-sort tree, but due the fact that there is no "merge" at all going on, I omitted "merge" from the name. I really like this methodology, as it is much, much more simple than implementing a merge-sort tree. Suggestions and questions are always appreciated, and thanks for reading!

Loved it.

You probably want to sort everything first before adding to the Fenwick. That will always be $$$O(n \log n)$$$.

Wow, that is indeed very clever. I'll update the "build" part to reflect your suggestion ASAP.

Well if you got all of the queries, you can do offline processing, and the whole process will be $$$O((q + n) \log n)$$$ or something.

How? We can also add the query to the Fenwick tree just like the data (of course we need to also sort first). To query, we need to go throw the each bucket (the

`bit`

array) one by one, and iterate the element inside the bucket. Doing so we literally know where the query at relatively to the all of the numbers added to the Fenwick tree. without binary searching.If you dont want to add the the query to the same place as the data, you can make another Fenwick tree (still need to sort first), then use two pointer to find the position. They are still the same idea.

Well there is more on this topic. I wrote a small useless blog on this, but on segment tree instead (the iterative one, so the code is still very short).