Statement
We are given an array $$$a$$$ of length $$$n$$$. For some reason, we need answer $$$q$$$ queries. For the $$$i$$$-th, we need to return the contiguous subarray $$$a[l_i, r_i]$$$ of size $$$k_i=r_i-l_i+1$$$ in sorted order.
Solutions
Naive solution: $$$O(1)$$$ preparation, $$$O(klog(k))$$$ per query
We can just copy the subarray and sort it using various sorting algorithms. Using comparison sorting algorithms (quick sort, merge sort, std::sort, ...) will get us $$$O(klog(k))$$$. Using distribution sorts (radix sort, counting sort, ...) will get us $$$O(k\times constant)$$$ but these are generally not much better than comparison sorts due to high constants.
Merge sort tree: $$$O(nlog(n))$$$ preparation, $$$O(log(n)+klog(k))$$$ or $$$O(log(n)+klog(log(k)))$$$ or $$$O(log(n)+k)$$$ per query
To make the math easier to work with, we will assume that the segment tree is a Perfect Binary Tree.
Recall that when querying a range of size $$$k$$$, we need at most $$$2log_2(k)=O(log(k))$$$ nodes to completely represent the range $$$[l, r]$$$$$$(r=l+k-1)$$$. Further more, we will have at most 2 nodes that have size $$$2^i$$$ for each $$$0\leq i \lt log(k)$$$.
ProofThe above facts are rarely mentioned by tutorials on segment tree but can be easily proved using the following construction:
Start at the leaves level of the tree with a set $$$S$$$ of $$$k$$$ leaf nodes that represent the queried range.
For all two nodes have the same parent, we remove the two nodes from and add the parent node to $$$S$$$.
Add the unpaired node(s) (if any) to a list of nodes $$$L$$$ that at the end will represent the queried and remove it from $$$S$$$.
Repeat 2. and 3. until we have nothing left in $$$S$$$.

When we do step 2., we have some continuous nodes at the same height on the tree, so continuous pairs of nodes have the same parent and we are left with at most $$$2$$$ unpaired nodes, the left-most and right-most.
This means that after 2. and 3., we have at most half the nodes in $$$S$$$ compared to before we do 2. so we only need to do this $$$log_2(k)$$$ times before we are left with nothing in $$$S$$$.
Because there are at most $$$2$$$ unpaired nodes per level, we need at most $$$2log_2(k)=O(log(k))$$$ nodes to represent the whole range of size $$$k$$$. Because the nodes start at size 1 and double in size each time, we have at most 2 nodes of size $$$2^i$$$ for each $$$0\leq i \lt log(k)$$$.
Using a merge sort tree, we can obtain $$$O(log(k))$$$ sorted subarrays that together represent the range. Doing this take $$$O(log(n)+log(k))=O(log(n))$$$. We need to somehow merge these sorted subarrays into one sorted subarray.
If we just merge the subarrays one by one randomly, we'll need $$$O(klog(k))$$$ because we need to merge $$$O(log(k))$$$ times, each time we need $$$O(k)$$$ to merge. This is even worse than the naive solution because we need to spend $$$O(log(n))$$$ to get the subarrays.
We can try to merge all the arrays at the same time. To do this, we need to find the correct value among $$$O(log(k))$$$ values from each array. Using a priority queue (std::priority_queue, std::set, heap, ...), we can get this value in $$$O(log(log(k)))$$$ each time and get $$$O(klog(log(k)))$$$ to merge all of them. Total complexity per query is then $$$O(log(n)+klog(log(k)))$$$.
To do better, realize that when we merge the subarrays one by one, we can choose the order in which we merge them and potentially get a better run time. Because we have at most 2 nodes of each size $$$2^i$$$ for $$$0\leq i \lt log(k)$$$, if we always merge two smallest subarrays the total complexity is $$$O(k+log(n))$$$.
Proof and details - To make the math easy, we will just assume the worst case, 2 nodes for each size $$$2^i (0\leq i \lt log(k))$$$. We also add in 2 dummy nodes of size 1 so we start with 4 nodes of size 1 and 2 nodes of each size $$$2^i (0 \lt i \lt log(k))$$$.
- For each $$$0 \leq i \lt log(k)$$$, we start with 4 nodes of size $$$2^i$$$, merging them into 2 nodes of size $$$2^{i+1} $$$ take $$$4\times 2^i$$$ operations. We will end up with 2 nodes of size $$$2^{log(k)}$$$, merging them into the final sorted subarray take $$$2^{log(k)+1}$$$ operations.
- The total number of operations is:
$$$\sum\limits_{i=0}^{log(k)-1}(4\times 2^i)+2^{log(k)+1}=4\times(2^{log(k)}-1)+2^{log(k)+1}=6\times2^{log(k)}-4=O(6\times k)=O(k)$$$ - We will also need $$$O(log(k)log(log(k)))$$$ to handle merging smallest nodes but this is neglectable compared to the $$$O(k)$$$.
[Featured solution] Alternative Merge sort tree: $$$O(nlog(n))$$$ preparation, $$$O(log(n)+k)$$$ per query.
This solution deal with merging subarrays by reducing the number of subarrays we need to merge instead of trying to merge them in a good way.
If we also store the original indices of the sorted elements in each node, for a query $$$[l, r]$$$ we can answer it by iterating over the sorted elements and only keep the elements with indices inside $$$[l, r]$$$. This is, however, $$$O(n)$$$.
If the size $$$k=r-l+1$$$ is big enough compared to $$$n$$$ (i.e. $$$k/n \geq constant$$$) then the operation complexity is also $$$O(k)$$$.
We can then try to reduce the number of nodes we need to merge by iterating instead of going to children nodes when the queried range is big enough compared to the node size.
The segment tree model will then be something like:
If the queried range contains no element in this node, return nothing.
If the queried range contains at least $$$constant$$$ of the total amount of elements in this node, iterate the current note and return the correct sorted subarray.
Otherwise, go to the children node and do find the sorted subarrays and merge them.
If we choose the $$$constant=\frac{1}{2}$$$, there is at most 2 nodes we need to iterate. Iterating and merging the results from these nodes will take $$$O(k)$$$ and make the complexity for each query $$$O(log(n)+k)$$$.
ProofWe start at the root of the tree, while the queried range is strictly inside one of the children nodes, we can go down that node. At the first node that the queried range lay in both children, there are two cases:
If the size of the queried range is at least $$$\frac{1}{2}$$$ the size of this node, we simply iterate the node and get the answer.
Otherwise, we want to go down the left and right nodes. Notice that starting from now, the queried range will also contain one of the endpoints of the node. This means that whenever the queried range are in both children nodes, we need to queries more than half of the elements of the node sp we can iterate instead of going down further. This image can probably help you visualise the process (the red segment is the queried range and the black segments are the nodes):

Because there's at most one time we go down both children nodes, there's at most 2 nodes that we need to merge. Because each time we only iterate when we need to queries more than half the elements, the total number of iterated elements is at most $$$2k$$$.
To optimize even further, we can merge the answer while iterating the required nodes.
Applications
Due to its linear to query size complexity, these algorithms probably have very little usage (if any) in Competitive programming. They probably can only be used on problems with a big restriction on the sum of queries size ($$$O(n\sqrt{n})$$$ for example).
One problem in which we can try to use the above algorithms is 963D - Frequency of String but we have to answer the queries online.
Proposed solution - We'll need to find all the indices that $$$m_i$$$ appears inside $$$s$$$ in sorted order and use two pointers.
- Use a suffix array to find a range of suffixes that $$$m_i$$$ can appear at the beginning of.
- Use the algorithms to recover the required indices in sorted order.
The complexity is mostly depending on finding the range of suffixes. We can do at least $$$O((n+|s|)log^2(|s|)+10^5\sqrt{10^5})$$$ if we implement suffix array using hashing and $$$O(|s|+nlog(|s|)+10^5\sqrt{10^5})$$$ if we use LCP array and suffix automaton (or anything that allow us to find any occurrence in $$$O(|m_i|)$$$).
Author's notes
963D - Frequency of String was the inspiration for me to search for an algorithm to answer this type of query.
I couldn't find any tutorial/algorithm online (maybe I didn't search hard enough) so I tried making my own and it worked. I thought I should share them with you despise their apparent unusefulness. I have likely missed some potential applications (problems that can be solved), so please try looking for them and please share with us if you do find any.
If you find any mistake, please let me know. If you do know any efficient algorithm to answer this type of query (or related ones) please share with us.
Special thanks to:
- ruanxiaoyu for answering my questions about Suffix Automaton.
- You for reading this blog.