Implicit cartesian tree in GNU C++ STL.

#	User	Rating
1	tourist	4009
2	jiangly	3823
3	Benq	3738
4	Radewoosh	3633
5	jqdai0815	3620
6	orzdevinwang	3529
7	ecnerwala	3446
8	Um_nik	3396
9	ksun48	3390
10	gamegame	3386

#	User	Contrib.
1	cry	167
2	Um_nik	163
3	maomao90	162
3	atcoder_official	162
5	adamant	159
6	-is-this-fft-	158
7	awoo	157
8	TheScrasse	154
9	Dominater069	153
9	nor	153

Hello, guys! Recently I read the book "Effective STL" written by S. Meyers and found a mention about rope data structure, which is implemented in some STL's versions. Briefly speaking this data structure can fastly insert arbitrary blocks of array to any position and erase them. So it's very similar to implicit cartesian tree (you can find some details in the wiki article mentioned above). It's used sometimes to handle a very long strings.

As it turned out, the rope is actually implemented in some versions of STL, for example, in SGI STL, and I should say that it's the most complete documentation of this class I've ever seen. And now let's find the rope in GNU C++. I used this task for testing. Here you should quickly move the block [l,r] to the beginning of the array 10⁵ times and besides the size of array is not greater than 10⁵:

#include <iostream>
#include <cstdio>
#include <ext/rope> //header with rope
using namespace std;
using namespace __gnu_cxx; //namespace with rope and some additional stuff
int main()
{
    ios_base::sync_with_stdio(false);
    rope <int> v; //use as usual STL container
    int n, m;
    cin >> n >> m;
    for(int i = 1; i <= n; ++i)
        v.push_back(i); //initialization
    int l, r;
    for(int i = 0; i < m; ++i)
    {
        cin >> l >> r;
        --l, --r;
        rope <int> cur = v.substr(l, r - l + 1);
        v.erase(l, r - l + 1);
        v.insert(v.mutable_begin(), cur);
    }
    for(rope <int>::iterator it = v.mutable_begin(); it != v.mutable_end(); ++it)
        cout << *it << " ";
    return 0;
}

It works perfectly, but 2 times slower than the handwritten implicit cartesian tree, but uses less memory. As far as I can see, GNU C++ has the same implementation of rope as SGI STL. Visual C++ doesn't support the rope class.

There are several points, that you should know about rope in C++. From SGI STL's documentation it's known that the rope doesn't cope well with modifications of single elements (that's why begin() and end() return const_iterator. If you want to use classic iterators, you should call mutable_begin() and mutable_end().). But my tests show that it's pretty fast (about $\text{[math]}$ ). Simultaneously operator += performes for O(1) (Of course, if we don't consider the time needed to construct the object on the right side).

You can see some features with [ ] operator in the following code: http://pastebin.com/U8rG1tfu. Since developers want to maintain the rope in permanent state, the operator [ ] returns const reference, but there is a special method to overcome it. This solution works with the same speed. Futhermore, I forgot to mention that all iterators are RandomAccess.

If you test this container, please, tell about your experience in comments. In my opinion, we got a pretty fast array with complexity $\text{[math]}$ for all operations :)

Comments (14)

Show archived | Write comment?

adelnobel

11 years ago, # |

← Rev. 2 →

+97

There is also an order-statistics tree in the SGI library included in STL.

Sample code: http://ideone.com/QuiYER

→ Reply

droptable

11 years ago, # ^ |

← Rev. 3 →

+43

Perlik

Cool! I have also found the patricia trie and splay tree there as well as the binomial heap and some other data structures, but it seems to me that they are incomplete.

Zlobober

+87

Well, Last night I was investigating contents of pb_ds (Politics based data strucutres) library, included by default in g++. I've found that implementations of binary search trees there are almost as we need: they support traveling with node iterators (such as moving to the left/right child, dereferencing and everything we need), splitting, mergeing and even contatining additional information in each node with its recalculation after every change of tree structure (such as size of subtree, minimum or maximum)!

Here are two examples of what we can do with this library.

First one. Basic examples of usage. Look over it, it's very interesting! It also compiles here on Codeforces.

Second one with solution of RMQ problem using this tree. When I was writing it I was thinking like "That's really great! That can replace treap as default binary search tree that people often write on contests! That's very cool!". But...

The main problem is that splitting is implemented in linear time because of very stupid bottleneck: after split they set the size of second subtree as std::distance(other.begin(), other.end(), that works in linear time. =(

There is also implementation of 4 types of heap. They support iterators to fixed elements of the heap, that makes us possible to write, for example, Dijkstra with heap algorithm. But here is a benchmark, that shows that it is still 20-30% slower then usual Dijkstra+set implementation.

Patricia tree was a surprise for me, essentially after I looked in wikipedia and found a compressed trie (!) picture. But in fact it is another associative container for storing keys in increasing order:

Patricia trees have excellent lookup performance, but they do so through maintaining, for each node, a miniature "hash-table". Their space efficiency is low, and their modification performance is bad.

(from official documentation of that library)

There is also forward-linked list and bunch of hashmaps (I didn't tested yet).

That trees are really what we could use in our contests, If it hadn't linear-time split. I've written an e-mail to the developers of that library with some questions about why they didn't implement it more efficient. Waiting for the answer from them.

What's their point of doing this in Linear time when it can be done in constant time? This is really weird! Nice effort by the way I really appreciate it :)

I also found a skip-list implementation, it allows finding, searching, deleting all in logarithmic time.

Excuse me, but in which file did you find the linear split method ? As from what I can see in this page

http://gcc.gnu.org/onlinedocs/libstdc++/ext/pb_ds/tree_based_containers.html

Scroll to the end of the page right to (Additional Methods)

You will find this: "These methods are efficient — red-black trees are split and joined in poly-logarithmic complexity; ordered-vector trees are split and joined at linear complexity. The alternatives have super-linear complexity."

So only ordered-vector trees are splitted in linear time, you should be alright if you use red-black tree as the underlying data structure by specifying the rb_tree_tag when creating the object?

I'll look into the implementation of the split method inside..

No, split will be anyway linear-time. After tree-depend implementation of split function there is a call of general function for all binary trees split_finish(other), where follows next line:

...
other.m_size = std::distance(other.begin(), other.end());
...

(/ext/pb_ds/detail/bin_search_tree_/split_join_fn_imps.hpp, #134).

Of course std::distance for iterators of this tree works in linear time (it shifts first iterator until it is equal to the second iterator). You can run my solution for RMQ on big test to ensure that I'm right. It's surprising, but in this place official documentation is wrong.

Aha I see...

niklasb

Did the authors reply by now? I think it was a design issue because to update the tree size quickly one needs the subtree size augmentation that is only provided by the tree_order_statistics_node_update mixin.

We can work around the problem by overloading std::distance, but it's not pretty: https://github.com/niklasb/contest-algos/blob/master/stl_splay_tree.cpp Overall it takes a lot of code to reimplement order statistics, so it's probably not really worth the effort, since a manually coded treap seems to be much faster and doesn't require a lot more code.