dsdp's blog

By dsdp, history, 10 months ago, In English

In C++, Currently, I have seen some code that implements segment trees/balanced trees using Pointers.

So, what are the respective advantages of using Pointers and not using them (constants, debugging difficulty, etc.)? I don't quite understand.

Or is there no difference? It's just a matter of personal style.

Some of the responses I have received so far are conflicting, so I came to the CodeForces community for help.

I would like to express our sincere gratitude to all in advance!

  • Vote: I like it
  • +28
  • Vote: I do not like it

»
10 months ago, hide # |
Rev. 2  
Vote: I like it +80 Vote: I do not like it

What scenario are you talking about exactly and what are the implementations that don't use pointers? There are two questions I think you could be asking:

  1. What to do for normal, dense segment trees (static layout or pointers)? or
  2. How to implement pointers for DS that actually need pointer-like objects (Node* vs. int offset into an array)?

The most common scenario is that you want to code a standard segment tree that fits in memory normally and it's not doing anything too special. One way to implement this is to use a static layout, where you can infer the positions of the parent/children of a node purely based on the node's index. For example, you can layout a segtree so that the root is at node 1 and node i has parent i/2 and children i*2 and i*2+1. Another approach is that for each node, you could store two pointers, one to each of its children (and maybe one to its parent if you need it). In this case, you should always pick the static layout option, since it's just better. It uses less memory, since you don't need to store pointers at each node. It's faster, since you can implement iterative segtree, it uses less memory and thus has better caching, and you can compute parent/children with a single arithmetic operation. It gives you greater flexibility, e.g. you can implement iterative segment tree, since you can exploit more properties of the structure, like leaves being at TSZ+i and nodes adjacent by index representing adjacent nodes in the tree. It's also probably shorter to implement. I really don't see why you would want to use pointers in this case.

The other scenario is when you have a data structure where you fundamentally need to use pointers (or pointer-like objects). This is the case in a persistent segment tree, an implicit (sparse) segment tree, or a BBST. In this case, the layout of the tree is more complicated, and wouldn't fit in memory if you tried to encode the structure statically like for a normal segment tree. Thus, you need to keep a reference to the two children (and maybe the parent) of the node at each node. However, you can still choose to use a pointer directly or store offsets into an array instead. Using pointers directly would be something like struct Node {Node *lc, *rc; int val;}, where the left child is *lc, the right child is *rc, and val is some data you store at each node that's not relevant to the structure. The other option is to store all of your nodes in a contiguous array Node mem[MEM_SZ]; (where MEM_SZ is the max number of nodes you will even want to allocate) and store lc and rc as integers that index into this array: struct Node {int lc, rc; int val;}, left child is mem[lc], right child is mem[rc]. To allocate a node, you take the next free element in mem and increment a counter by one (int mi = 0; void alloc(){return mem + ++mi;}). Deciding between these two options is actually a lot more interesting than deciding between static layout vs. pointers for the dense segtree case.

If you are on a system where sizeof(int) == sizeof(Node*) (e.g. a system where pointers are 32 bit), then you should probably use pointers. Directly using pointers is more convenient in my opinion (node->lc is shorter than mem + node->lc or mem[node].lc) and it's also slightly faster, since you are directly dereferencing the pointer instead of having to add it to the base array then dereference. Note that this difference in performance might not be too significant, since x86 architectures support complex addressing modes. This makes it so that in some cases, dereferencing a pointer and indexing an array could take the same amount of time (but other times, dereferencing directly might be a bit faster). For example, a line of C code int x = arr[i]; might effectively get turned into an assembly instruction mov x, DWORD [arr+i*4], which computes the pointer pointing to the i-th element of arr and loads the value stored at that memory location into a register in a single instruction!

However, if pointers are larger, then it is less clear whether it is a better idea to use them as opposed to integer offsets into an array. For example, on most systems, pointers are 64 bits, but using 32-bit integers for indexing into your segment tree/BBST array is sufficient. In this case, using an index has the advantage that is uses less memory, which also makes it faster, since your segment tree might fit in a smaller cache (L1 vs. L2 vs. L3 vs. DRAM). Pointers still have the advantage that dereferencing them is slightly simpler and thus faster than array indexing, and it is generally not obvious whether this difference in speed will be larger than the improvement you get from using less memory. Since I find pointers to be more convenient, I usually implement persistent/implicit segtrees and BBSTs with pointers first, and only switch to indices if I get TLE/MLE.

Also, even if you decide to use pointers, you should still pre-allocate all of your nodes in a large global array and you should basically NEVER use new/delete in competitive programming. Using new to allocate a single node is just slow. You usually know (an upper-bound on) the number of nodes you will use, so even if you really want to use dynamic memory allocation (instead of allocating a global array), allocate all of your nodes at once (e.g. using a vector so you don't have to worry about manually deallocating) so that you only do a single allocation, instead of doing one for every node.

TLDR: use static layout whenever possible; if you need pointers, use normal pointers first, then switch to offset into an array if you need to use less memory; and always allocate all nodes of your DS at once instead of calling new for each node.

  • »
    »
    10 months ago, hide # ^ |
     
    Vote: I like it -25 Vote: I do not like it

    wt

  • »
    »
    10 months ago, hide # ^ |
     
    Vote: I like it +28 Vote: I do not like it

    You can overload the new/delete operators to use a bump allocator. Maybe this is just as fast as allocating all the nodes you need?

    static char buf[50 << 20]; // 50MB
    void* operator new(size_t s) { static size_t i = sizeof buf; return (void*)&buf[i -= s]; }
    void operator delete(void*) {}
    
    • »
      »
      »
      10 months ago, hide # ^ |
       
      Vote: I like it 0 Vote: I do not like it

      Sure, that also works. This is effectively the same as what I suggested (Node mem[MEM_SZ]; int mi; Node* alloc(){return mem + ++mi;}) just a different (more general) interface.

»
10 months ago, hide # |
 
Vote: I like it +8 Vote: I do not like it

Something really good to know when using pointers is that std::deque is very similar to a vector, except that it doesn't invalidate pointers upon push backs (since deque internally stores the data in blocks of a fixed size, which are not moved by push back). I would highly advice to use std::deque.push_back instead of manually calling new and creating memory leaks.

  • »
    »
    10 months ago, hide # ^ |
    Rev. 2  
    Vote: I like it 0 Vote: I do not like it

    Using deques can easily lead to MLE — this actually happened once in (Chinese)NOI, where many participants lost points because they used $$$10^6$$$ deques.

    • »
      »
      »
      10 months ago, hide # ^ |
       
      Vote: I like it 0 Vote: I do not like it

      That is a drawback of deque internally storing its data in blocks of fixed size. A deque with a single element takes up a significant amount of memory since it has to allocate one entire block. So the fixed sized block can be both an advantage (pointers are not invalidated on push back, fewer allocations needed) and a disadvantage (single element deque is memory hungry).

»
10 months ago, hide # |
 
Vote: I like it +5 Vote: I do not like it

Thanks.That's all my need.