Xiaohuba's blog

By Xiaohuba, history, 4 months ago, In English

UPD: I've reduced the code size.

I've recently found that the following code will generate wrong output.

#include <bitset>
#include <iostream>
const int N = 105;
std::bitset<N> ok[N][N];
int n = 5;
int main() {
  ok[2][2].set(2);
  for (int i = n; i; i--)
    for (int j = i; j <= n; j++) {
      ok[i][j] = ok[i][j] | ok[i + 1][j] | ok[i][j - 1];
    }
  std::cout << ok[2][5][2] << '\n';
  return 0;
}

Compiled with -O3 -mtune=skylake -march=skylake, the code outputs 0.

However if you simulate the code you will know that the correct answer should be 1.

Note that the compiler seems to generate wrong sse instruction.

Godbolt link

Again, I believe this code is ub-free, and has nothing to do with implementation-defined stuff.

  • Vote: I like it
  • +63
  • Vote: I do not like it

»
4 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by Xiaohuba (previous revision, new revision, compare).

»
4 months ago, # |
Rev. 8   Vote: I like it +13 Vote: I do not like it

Further reduced:

typedef struct {
  unsigned long words[2];
} Child;

typedef struct {
  Child child;
} Parent;

Parent my_or(Parent x, const Parent *y) {
  const Child *y_child = &y->child;
  for (int i = 0; i < 2; i++) {
    x.child.words[i] |= y_child->words[i];
  }
  return x;
}

int main() {
  Parent bs[4];
  __builtin_memset(bs, 0, sizeof(bs));

  bs[0].child.words[0] = 1;
  for (int i = 1; i <= 3; i++) {
    bs[i] = my_or(bs[i], &bs[i - 1]);
  }
  return bs[2].child.words[0];
}
»
4 months ago, # |
  Vote: I like it -13 Vote: I do not like it

I bet AI couldn't do that

»
4 months ago, # |
Rev. 2   Vote: I like it +1 Vote: I do not like it

It's good with O2. I am always scared to use O3, now I have an additional reason to keep being scared. While searching I found something related to O3 and AVX probably not being addressed for a long time (>10 years) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49001 . This may not be this issue though, as I tried to adjust alignas and that didn't help.