Andycode3759's blog

By Andycode3759, history, 2 years ago, In English

So here is the thing: recently I was doing some sustainable data structure problems while I found that the behavior of vector seems to vary from C++ standard versions. I did a little experiment:

#include <vector>
#include <cstdio>
using namespace std;
const int MAXN = 1000006;

int n;
int arr[MAXN];

struct SegTreeNode
{
    SegTreeNode(int _l = 0, int _r = 0, int _v = 0)
    {
        val = _v;
        ls = rs = -1;
        sl = _l, sr = _r;
    }
    int val;
    int ls, rs;
    int sl, sr;
};

vector<SegTreeNode> sgt;
int root;

inline int newNode(int sl, int sr, int val)
{
    sgt.push_back(SegTreeNode(sl, sr, val));
    return sgt.size() - 1;
}
int build(int l, int r)
{
    int t = newNode(l, r, 0);
    if (l == r)
    {
        sgt[t].val = arr[l];
        return t;
    }
    int mid = (l + r) >> 1;
    sgt[t].ls = build(l, mid);
    sgt[t].rs = build(mid + 1, r);
    sgt[t].val = sgt[sgt[t].ls].val + sgt[sgt[t].rs].val;
    return t;
}
int query(int idx, int l, int r)
{
    if (l <= sgt[idx].sl && sgt[idx].sr <= r)
        return sgt[idx].val;
    int mid = (sgt[idx].sl + sgt[idx].sr) >> 1;
    int res = 0;
    if (l <= mid)
        res += query(sgt[idx].ls, l, r);
    if (r > mid)
        res += query(sgt[idx].rs, l, r);
    return res;
}

int main()
{
    scanf("%d", &n);
    for (int i = 1; i <= n; i++)
        scanf("%d", arr + i);
    root = build(1, n);
    int l, r;
    scanf("%d %d", &l, &r);
    printf("%d\n", query(root, l, r));
    for (int i = 1; i <= n; i++)
        printf("%d ", query(root, i, i));
    printf("\n");
    for (SegTreeNode nd : sgt)
        printf("val=%d [%d,%d] ls=%d rs=%d\n", nd.val, nd.sl, nd.sr, nd.ls, nd.rs);
    return 0;
}

It's just a basic segment tree with dynamically allocated nodes. I compiled the same code in both C++14 and C++17, and I got different output from two programs:

Input:
5
1 2 3 4 5
1 4

C++14:
8
4 4 4 4 4
val=8 [1,5] ls=-1 rs=-1
val=4 [1,3] ls=-1 rs=5
val=2 [1,2] ls=3 rs=-1
val=1 [1,1] ls=-1 rs=-1
val=2 [2,2] ls=-1 rs=-1
val=3 [3,3] ls=-1 rs=-1
val=8 [4,5] ls=7 rs=-1
val=4 [4,4] ls=-1 rs=-1
val=5 [5,5] ls=-1 rs=-1

C++17:
10
1 2 3 4 5
val=15 [1,5] ls=1 rs=6
val=6 [1,3] ls=2 rs=5
val=3 [1,2] ls=3 rs=4
val=1 [1,1] ls=-1 rs=-1
val=2 [2,2] ls=-1 rs=-1
val=3 [3,3] ls=-1 rs=-1
val=9 [4,5] ls=7 rs=8
val=4 [4,4] ls=-1 rs=-1
val=5 [5,5] ls=-1 rs=-1

Only the result of C++17 is correct. I also modified the program to use static arrays and the correct output appears in both C++14 and C++17. So I'm confident that the vector in C++14 is to blame.

Can anyone explain why it happens? I tried to compile the source into assembly code and I did found that there are differences, but I can't understand them: dynamic-segtree-cpp14.asm, dynamic-segtree-cpp17.asm. Do they provide clearer evidence?

Tags c++
  • Vote: I like it
  • 0
  • Vote: I do not like it

»
2 years ago, hide # |
 
Vote: I like it 0 Vote: I do not like it

Although I don't fully understand the internal code of STL, I have encountered problems similar to yours. Generally speaking, vector may apply for new memory when changing its size, which will cause the previous address to become invalid. I think code like "sgt[t].ls = build(l, mid);" may be the culprit. It first obtains an address in the vector, and then enters the build function, which causes the vector to expand and the address Invalid. An error occurs, which explains why static arrays are correct. As for why C++17 is correct, I don't know much about it. It may be that the standard stipulates the execution order of assignment statements, or something else. But I suggest it's better not to write such code in CP. Hope this helps! sr for my poor english, used google translate :P

»
2 years ago, hide # |
 
Vote: I like it 0 Vote: I do not like it

c++17 added a lot of extra sequence points in evaluation order (https://en.cppreference.com/w/cpp/language/eval_order — in this case specifically rule 19).

In C++14 your program has undefined behaviour because there isn't any ordering requirement between evaluating where to store the result (sgt[t].ls) and evaluating the result to store there (build(l, mid)).

  • »
    »
    2 years ago, hide # ^ |
    Rev. 3  
    Vote: I like it 0 Vote: I do not like it

    I'd say that even in C++17 this is undefined behavior, because build is called after evaluation of reference to node and reference can be lost because of reallocation during buildNode

    Edit: sure static arrays would work, because there is no reallocation.

    Edit 2: stupid me, of course in C++17 left side of assignment is strictly after right side.