Rating changes for last rounds are temporarily rolled back. They will be returned soon. ×

sslotin's blog

By sslotin, 5 years ago, In English

There is a couple of things about sparse tables that a lot of people I think are doing slightly wrong (including the popular reference implementation by e-maxx).

First, you don't need to precalculate and store logarithms in an array. In fact, whichever procedure compiler will pick to calculate the logarithm is probably going to be much faster than looking it up somewhere in random access memory. The optimal way would be to use __builtin_clz ("count leading zeroes") which uses a separate instruction available on most modern x86's. This way you need just 2 memory reads instead of 3, so on large arrays this should be ~1.5x faster.

Second, memory layout and the order you iterate it matters when you're building the sparse table. There is a total of $$$2 \times 2 = 4$$$ ways to do it, and only one of them results in beautiful linear passes that work 1.5-2x faster.

It's easier to implement too:

int a[maxn], mn[logn][maxn];

int rmq(int l, int r) { // [l; r)
    int t = __lg(r - l);
    return min(mn[t][l], mn[t][r - (1 << t)]);
}

// somewhere in main:
memcpy(mn[0], a, sizeof a);
for (int l = 0; l < logn - 1; l++)
    for (int i = 0; i + (2 << l) <= n; i++)
        mn[l+1][i] = min(mn[l][i], mn[l][i + (1 << l)]);

Implementation was updated (tnx jef and plainstop):

original query implementation

Also, it's interesting that the last loop is not getting auto-vectorized by the compiler (because std::min is probably something overly complex), and replacing it with simple (x < y ? x : y) gets total speed up to ~3x if you include avx2, but I personally wouldn't do this because it looks cumbersome:

int x = mn[l][i];
int y = mn[l][i + (1 << l)];
mn[l+1][i] = (x < y ? x : y);

(Testing performed on my laptop; didn't run it on CF or other online judges.)

  • Vote: I like it
  • +112
  • Vote: I do not like it

»
5 years ago, # |
  Vote: I like it -30 Vote: I do not like it

Sparse tables aren't usually something you need to constant-optimise, especially building them. Also, they contain a lot of redundant information, at least in this version. It's enough to build mn[l][i] if $$$2^l | i$$$ and these arrays can also be compressed to take up just $$$O(N)$$$ space, which should optimise cache access at high levels.

The main advantage of sparse tables is ease of use.

  • »
    »
    5 years ago, # ^ |
    Rev. 2   Vote: I like it +8 Vote: I do not like it

    Yeah, agreed; the primary point was that it's both faster and simpler (or at least not harder) to code.

    I've never heard of this compression thing, but isn't it equivalent to some form of a segment tree? Don't you still need $$$O(\log n)$$$ lookups per query?

    • »
      »
      »
      5 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Yeah, O(log) per query and it's pretty much a segment tree without upťdates. It's just that you talked about optimising build time,so I'm pointing out how it can be optimised better (although query performance may suffer) a bit.

  • »
    »
    5 years ago, # ^ |
    Rev. 2   Vote: I like it +21 Vote: I do not like it

    How would you keep the $$$O(1)$$$ query performance with $$$O(N)$$$ memory? I think what you are describing is a segment tree of some sort, instead of a sparse table.

    Edit: Oops, it seems to have been mentioned before already.

    • »
      »
      »
      5 years ago, # ^ |
        Vote: I like it -52 Vote: I do not like it

      The OP emphasised building performance, which isn't usually viewed as important.

      $$$O(1)$$$ query performance though? How would you do that without $$$O(N^2)$$$ preprocessing?

      • »
        »
        »
        »
        5 years ago, # ^ |
          Vote: I like it +47 Vote: I do not like it

        It sounds like you're missing the point of sparse table. (Or I'm missing the point of your comments or the OP.)

        The query code in the OP takes $$$O(1)$$$ time — it takes the min of precomputed values for two overlapping ranges. That requires all $$$O(n log n)$$$ precomputed values. What you described is interval tree rather than sparse table.

        • »
          »
          »
          »
          »
          5 years ago, # ^ |
            Vote: I like it -29 Vote: I do not like it

          Huh. I didn't realise queries in sparse table could be done in this way — nice trick. The OP also mentioned $$$O(\log)$$$ query, seems like he took it for granted too. Of course, it only works for RMQ and similar idempotent operations, and I don't think I'll ever need it in sparse table, but nice trick.

          • »
            »
            »
            »
            »
            »
            5 years ago, # ^ |
              Vote: I like it +35 Vote: I do not like it

            I have never used sparse table in any other scenario. This is really weird.

            • »
              »
              »
              »
              »
              »
              »
              5 years ago, # ^ |
                Vote: I like it -36 Vote: I do not like it

              You always needed $$$O(1)$$$ rather than $$$O(\log)$$$ per query when you used sparse table?

              • »
                »
                »
                »
                »
                »
                »
                »
                5 years ago, # ^ |
                  Vote: I like it +11 Vote: I do not like it

                What's the advantage of using a Sparse Table then?

                • »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  5 years ago, # ^ |
                  Rev. 2   Vote: I like it -28 Vote: I do not like it

                  >_>

                  It's right at the end of my first post here.

                • »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  5 years ago, # ^ |
                    Vote: I like it 0 Vote: I do not like it

                  Interesting. I always found them nastier than segment tree because of ton of +-1s.

                  Could you share you implementation?

»
5 years ago, # |
  Vote: I like it +37 Vote: I do not like it

GCC also offers std::__lg, which does the same thing as your lg function.

»
5 years ago, # |
  Vote: I like it +16 Vote: I do not like it

Your array indices are reversed in your query function.

  • »
    »
    5 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    they are still reversed in the original query implementation?

    • »
      »
      »
      5 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Yes, t should be the first index.

»
5 years ago, # |
  Vote: I like it +5 Vote: I do not like it

Can I ask a quite stupid question ? What is the complexity of __builtin_clz ? Is it $$$O(1)$$$ or $$$O(log n)$$$ ?

Thanks <3.

  • »
    »
    5 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    This might be helpful for you

  • »
    »
    5 years ago, # ^ |
    Rev. 4   Vote: I like it +10 Vote: I do not like it

    $$$O(1)$$$: it calls a single processor instruction, as most (all?) of __builtin* functions.

    According to this table, its latency is 3 cycles, the same as multiplying two integers. The instruction is called BSF on Intel Core i7 ("Nehalem").