Sort before insert — A small, yet powerful extension to Merge sort tree

№	Пользователь	Рейтинг
1	tourist	3985
2	jiangly	3814
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3517
7	Radewoosh	3410
8	hos.lyric	3399
9	ecnerwala	3392
9	Um_nik	3392

№	Пользователь	Вклад
1	cry	169
2	maomao90	162
2	Um_nik	162
4	atcoder_official	161
5	djm03178	158
6	-is-this-fft-	157
7	adamant	155
8	awoo	154
8	Dominater069	154
10	luogu_official	150

Hello Codeforces!↵
↵
This is my first ever blog on Codeforces, and today I wanted to <s>gain contributions</s> share my small invention during my upsolving. I don't know if there existed a similar idea yet, but as far as I can tell, the editorial for the problem that I have solved does not use my idea. I think it would be nice to share with you guys and have your opinions on this idea.↵
↵
<h1>Prerequisite</h1>↵
You should understand what is a segment and what is a merge-sort tree first.↵
↵
All the code in this blog is in <a href="https://mirror.codeforces.com/blog/entry/18051">efficient style</a>, therefore you should at least recognize part of it before reading the code.↵
↵
<h1>Idea explanation</h1>↵
<h2>How do we build Merge-sort tree again?</h2>↵
Merge sort tree is a Segment tree, each node of which contains a sorted set/vector of every element in its range. We build it in a bottom-up manner. To build the parent nodes, we merge 2 sets/vectors of the children the same as we do in the merge-sort algorithm in $O(\text{number of elements})$, hence the name Merge-sort tree. ↵
↵
<b>But there is another</b> way.↵
↵
Let `A` be the array we are building the merge-sort tree on, `it[i]` be the vector that contains sorted elements in the range conquer by the `i-th` node. Let's create another array `B` that stores the following pairs: `B[i].first = A[i], B[i].second = i`. We sort the array `B` in the increasing order of `first`. Then we iterate each pair element `value, position` of `B` in that sorted order, push the `value` to the back of every `it[i]` where `i` is the node of the merge-sort tree that contains `position`.↵
↵
We can see that I sort the array before inserting, so I decided to call this trick `sort before insert`. Do you guys have a better name?↵
↵
<b>Some advantages of "sort before insert" over the classical way</b>↵
<ul>↵
  <li>We can handle the case where we have multiple element located in the same position.</li>↵
  <li>With <a href="https://mirror.codeforces.com/blog/entry/18051">efficient style</a> we can build this tree iteratively.</li>↵
</ul>↵
↵
Here is the small snippet for building the tree.↵
↵
~~~~~C++↵
const int N = 1e5;↵
vector<int> it[2 * N];↵
void build(const vector<int>& a) {↵
  vector<pair<int, int>> b;↵
  for (int i = 0; i < (int)a.size(); ++i) {↵
    b.emplace_back(a[i], i);↵
  }↵
  sort(b.begin(), b.end());↵
  for (auto [val, p]: b) {↵
    for (p += (int)a.size(); p > 0; p >>= 1)↵
      it[p].push_back(val);↵
  }↵
}↵
~~~~~↵
↵
<small>So we can call merge-sort tree — quick-sort tree from now on. :))</small>↵
↵
<h2>Can we do with ranges?</h2>↵
As we can see, each node in the merge-sort tree contains only elements of its range, and in the `sort before insert` version, we used <b>point-update</b> to build the tree. Can we do the same with <b>range-like update</b>? Yes, yes we can!↵
↵
But what does each node contains exactly? We already know that for every interval, we can break it into $O(\log n)$ sub-intervals, each of them will correspond to a node of the segment tree. So if we have some intervals with some associated value, we can add these values to every node corresponding to the sub-intervals of the considering interval, so that the vectors of the nodes will still be sorted.↵
↵
Here is the implementation.↵
↵
~~~~~C++↵
const int N = 1e5;↵
vector<int> it[2 * N];↵
struct Range {↵
  int l, r, value;  // [l, r)↵
};↵
void build(vector<Range> a) {↵
  sort(a.begin(), a.end(), [](const Range& u, const Range& v) { return u.value < v.value});↵
  int n = (int)a.size();↵
  for (auto [l, r, val]: a) {↵
    for (l += n, r += n; l < r; l >>= 1, r >>= 1) {↵
      if (l & 1) it[l++].push_back(val);↵
      if (r & 1) it[--r].push_back(val);↵
    }↵
  }↵
}↵
~~~~~↵
↵
<h1>Basic application</h1>↵
↵
<h2>SPOJ <a href="https://www.spoj.com/problems/KQUERY/">KQUERY</a></h2>↵
We are given an array $a$ of length $n$. We need to process $q$ queries. For each query, we are given 3 numbers $l, r, k$. We need to count how many index $i$ such that $l <= i <= r$ and $a[i] < k$.↵
↵
This is a classical problem and can be solved with a Fenwick tree with coordinate compression in $O((n + q) \log n)$. But if the merge-sort tree is used instead, the complexity is $O(n \log n + q \log^2 n)$ with binary searching on each node of the sub-intervals for each query. Using `sort before insert`, we can change the binary searching part to two pointer technique by the following:↵
<ul>↵
  <li>Build 2 trees with `sort before insert`, one for array's elements, one for the query.</li>↵
  <li>We go from top to bottom in both trees simultaneously. Considering the nodes corresponding to the same interval on both trees. Here we find the answer locally for all queries contains this sub-interval with two pointer technique. Since queries that contain this sub-interval is sorted by their value and the array elements contained in this sub-interval is also sorted, we can maintain 2 pointers, one point to the current query, the other point to the last array's element that is bigger (or first element that is smaller or equals) than the current value of the query.</li>↵
</ul>↵
↵
The complexity of this approach is $O((n + q) \log n + q \log q)$. Firstly, we must sort both array elements and the queries. Secondly, we can see that the number of times we visit each array's element equals the number of the node that contains it, which is $O(n \log n)$. And finally, the number of times we visit each query equals the number of sub-intervals of its query range.↵
↵
The implementation is <a href="https://ideone.com/R9GWOS">here</a>↵
↵
<h2>SPOJ <a href="https://www.spoj.com/problems/MKTHNUM/">MKTHNUM</a></h2>↵
We are given an array $a$ of length $n$. We need to process $q$ queries. For each query, we are given 3 numbers $l, r, k$. We need to find the $k$-th smallest element among $a_l, a_{l + 1}, a_{l + 2}, ..., a_r$.↵
↵
There is already a solution using the merge-sort tree in $O((n + q) \log^2 n)$. But I wanted to discuss a little bit naive solution with the merge-sort tree. Let's do binary searching for the answer. If $F(x, l, r)$ is the number of elements in the range $[l, r)$ that is smaller or equals to x, then the answer for the query $l, r, k$ is the smallest number $v$ such that $F(v, l, r) >= k$. We can already see that this is exactly the same as the problem SPOJ KQUERY. If we use the above approach, then for the single element (q = 1), we have the complexity of $O(n \log n \log (10^9))$. But doing that $q > 1$ times, we need to do multiply the complexity by $q$, which is very bad.↵
↵
But the in problem KQUERY, we actually can check $q$ numbers simultaneously. So the obvious optimization here is to use <a href="https://mirror.codeforces.com/blog/entry/45578"> parallel binary search</a>. Implementation is <a href="https://ideone.com/tgiUz8">here</a>.↵
↵
<h2>2D range sum query for big, spare matrix (with updates).</h2>↵
I can't actually find this problem anywhere (with my constraints), so here is the statement with constraints:↵
You are given a matrix of size $n \times m$, initially filled with 0. You need to process $q$ queries of 2 types:↵
<ul>↵
  <li>`1 r c v` — change the value of cell $(r, c)$ to $v$. </li>↵
  <li>`2 r1 c1 r2 c2` — get the sum of the elements of the submatrix, limited by 2 cells $(r1, c2)$ and $(r2, c2)$</li>↵
</ul>↵
↵
If we have $n \times m \le 10^5$, we can solve this problem with a 2D Fenwick tree in $O(\log n \log m)$ for each query. But here I wanted to have $n \le 10^5, m \le 10^5, q \le 10^5$. (Also note that we can use a persistent segment tree to solve this problem but without the first type query).↵
↵
As you can already guess, we can do some kind of adding the update and queries into the nodes of the merge-sort tree then do something like 2 pointer technique. But what is the sort order? Well, the sort order here is, conveniently, the order of the processing (already given by the input). So first, we need to build 2 trees, one for update and one for queries. For each update, we add it to the first tree at the position given by its `r` coordinate. For each query, we add it to the second tree with the range `[r1, r2]`. So now we can go down and find the answer limited by the column only. For this, I use a Fenwick tree. Because we add the updates and the queries by the appearance in the input, in each node we can just process them by the added order.↵
↵
A small implementation detail though is that we need a fresh Fenwick tree fast for each node. For this, we can mark all changes, then later we just need to roll them back.↵
↵
To summarize, we got a $O(q \log n \log m)$ solution. Each query will be added to $O(\log n)$ different nodes, and in each node, we add/query them with the Fenwick tree in $O(\log m)$.↵
↵
Because I cannot find the problem, I also wrote a small test generator with a naive solution then check my solution against it. Implementation is <a href="https://ideone.com/xgQ0Kk">here</a>.↵
↵
<h1>Some real problem application</h1>↵
<h2>H. Hold the Line. 2019-2020 ICPC Asia Hong Kong Regional Contest</h2>↵
<a href="https://mirror.codeforces.com/gym/102452/problem/H">Link to the problem in the Gym</a>. You can find the editorial <a href="https://mirror.codeforces.com/blog/entry/72034"> here</a>↵
Basically, we are given a number $n$ and an array $a$ of length $n$. initially filled with $-1$. Then we need process $m$ queries of 2 types:↵
<ul>↵
<li>Given 2 numbers $i$ and $h$. Set $a[i]$ to $h$. Each position will be set at most once.</li>↵
<li>Given 3 numbers $l, r, H$. We need to find the position $i \in [l, r]$ such that $a[i] \ne -1$ and $|a[i] - x|$ is minimized. If there is no such $i$, then output -1 </li>↵
</ul>↵
↵
Well in the editorial, this problem is solved in $O(n \log n + m \log^2 n)$. But no I don't like the square here, and instead I will demonstrate how I solved it in $O((n + m) \log n)$ (or $O((n + m) \log n \cdot \alpha(n + m))$ since I also used DSU).↵
↵
Let's first see how we solve this problem without the positions, i.e we have an empty set of number, the operation is either add an element to the set or querying the lower/upper bound element of the set. This problem literally can be solved with the STL set in $O(n\log n)$. But if we somehow have the sorted order of added/querying elements, we can do in $O(n\cdot \alpha (n))$. Let's do the reverse problem then: we initially have a set of numbers, and our operations are either to remove elements or querying lower/upper bound. Because we have the sorted order, let's create a linked list containing both elements of the set and the querying elements. We can already see that the removal cost is constant. For querying, we can just see the next and previous element of the querying element in the linked list. A little problem here is that the next and previous element must be the element of the original array, not the querying element. We can use DSU to join all querying elements into one, so the next and previous elements will no longer be the querying element.↵
↵
Now back to the original problem. For each node, we need to maintain the order of operations for the updates in it, as well as their sorted order by value, in other words, each node needs 2 vectors. Well, we can first add them in the order given by the input to the first vector of the nodes, after that, we sort the update/queries by their values and finally add them to the second vector of the nodes. Doing so we have reduced the problem to just add and querying value for the set for each node.↵
↵
If you guy wanted to see my <s>ugly</s> implementation, <a href="https://ideone.com/oNh601">here</a> it is (test generator included).↵
↵
<small>Fun fact: I invented this trick when solving this problem.</small>↵
↵
<h2>G. Greatest Square — Grand Prix of NorthBeach</h2>↵
You can find the link to the upsolving and problem statement <a href="https://mirror.codeforces.com/blog/entry/84466">here</a>.↵
↵
We are given a polygon, edges of which are parallel either to $Ox$ or $Oy$, and each consecutive pairs of edges are perpendicular. We need to process $q$ queries. In each query, we are given a point $P$ lying strictly inside the polygon, and we need to find the side of the largest square, having $P$ as the lower-left corner and lie inside the polygon.↵
↵
The base solution is base on <a href="https://mirror.codeforces.com/blog/entry/85115?#comment-727505">this comment</a>. Please read it before continuing. The problem is, I actually don't know how to do the second part with 1 basic segment tree, so here I improvised and finally decided to use this trick.↵
↵
Let's reformalize what we need to find. For each querying point, we draw a line parallel to the line $x = y$ from it, and another line parallel to $Ox$ from it. Let's call these lines $a$ and $b$ respectively. We need to find the point with the smallest $x$ the lie below the line $a$, but not below the line $b$.↵
↵
So first we sort all polygon points, as well as the querying points, increasingly by $(x - y)$. In other words, we sort them in the sweep-line order when we sweep them diagonally. For each polygon points $(x, y)$, we add it to the tree with the range $[0, y]$. For each querying points $(x, y)$, we add it to the tree with the position $y$. Doing so, when we are processing each node, we don't need to care about the condition "not below the line $b$" anymore. And because we add the points in the sweep-line order, we can use 2 pointer technique to ensure the "below the line a" condition. The answer locally in each node for each query is the minimum $x$ of some prefix of the polygon points.↵
↵
My <s>ugly</s> <a href="https://ideone.com/80D2L5">implementation</a> again.↵
↵
<h1>Summary</h1>↵
I know that this trick is not as powerful as it sounded in the title (the title is just for clickbait though :)). But it has some nice advantages:↵
<ul>↵
<li>Less implementation that the other data structure.</li>↵
<li>Remove 2 (or maybe more) restrictions at the same time in one node.</li>↵
<li>Everything is iterative.</li>↵
</ul>↵
↵
But the main drawback is (<b>or is it???</b>), this is heavily offline processing. ↵
↵
I really hope that this idea can be seen more often, but sadly until now I only found out the above with a "meaningful" application. And I hope that you guys can find this idea useful!

Rev.	Кто	Когда	Δ	Комментарий
en25	darkkcyan	2021-07-06 10:35:36	0	(published)
en24	darkkcyan	2021-07-06 10:34:17	270	Add an applicable problem
en23	darkkcyan	2021-07-06 10:29:28	454	(saved to drafts)
en22	darkkcyan	2020-12-12 18:43:21	11	Fix typo.
en21	darkkcyan	2020-12-12 18:40:33	12	Fix typo.
en20	darkkcyan	2020-12-12 17:15:56	2	Fix typo.
en19	darkkcyan	2020-12-12 17:13:46	417	Add short problem statements for the first 2 problems.
en18	darkkcyan	2020-12-12 17:08:46	285	Add prerequisite.
en17	darkkcyan	2020-12-12 16:58:54	2	Fix copy and paste code.
en16	darkkcyan	2020-12-12 16:52:06	1	Fix typo.
en15	darkkcyan	2020-12-12 16:46:04	0	(published)
en14	darkkcyan	2020-12-12 16:45:54	64	Tiny change.
en13	darkkcyan	2020-12-12 16:43:24	16	Tiny changes.
en12	darkkcyan	2020-12-12 16:38:49	411	Fix typo.
en11	darkkcyan	2020-12-12 16:18:51	672	Add summary.
en10	darkkcyan	2020-12-12 16:10:52	1918	Add GP of NorthBeach G application.
en9	darkkcyan	2020-12-12 15:42:19	2714	Add H 2020 HongKong regional application.
en8	darkkcyan	2020-12-12 13:15:41	2122	Add 2D range sum application.
en7	darkkcyan	2020-12-12 11:45:13	1027	Add MKTHNUM application.
en6	darkkcyan	2020-12-12 11:14:41	54	Minor changes for the bulding code.
en5	darkkcyan	2020-12-12 11:09:29	1866	Add application for KQUERY.
en4	darkkcyan	2020-12-12 10:34:28	26	Minor changes in the building code.
en3	darkkcyan	2020-12-12 10:29:33	1324	Add building the tree with range.
en2	darkkcyan	2020-12-12 10:08:56	1795	Added building part for idea explaination.
en1	darkkcyan	2020-12-12 09:45:32	423	Initial revision (saved to drafts)

История