Блог пользователя bitset

Автор bitset, история, 6 месяцев назад, По-английски

UPD1: The bug has been fixed. I'll update the blog again when the fix is on Codeforces.

UPD: Unfortunately, left / right shift is bugged and doesn't clear bits properly. It should be fine for other operations, but use it at your own risk.

Original Blog
  • Проголосовать: нравится
  • +514
  • Проголосовать: не нравится

»
6 месяцев назад, # |
  Проголосовать: нравится +45 Проголосовать: не нравится

Note: they added more features to dynamic_bitset in a newer Boost version, but they haven't been added to GCC yet. So that's why I linked an older version of Boost's docs.

»
6 месяцев назад, # |
Rev. 2   Проголосовать: нравится +81 Проголосовать: не нравится

Fun fact, you can also write dynamic_bitset<__uint128_t> which is technically $$$O(\frac{n}{128})$$$! 260859954

  • »
    »
    6 месяцев назад, # ^ |
      Проголосовать: нравится +69 Проголосовать: не нравится

    I doubt it will be faster, because __uint128_t is not a native type (the CPU can only handle integers up to 64 bits). Basically the compiler thinks __uint128_t is two uint64_ts, and operations involving __uint128_ts are translated to multiple instructions handling uint64_ts.

    Experiment
    • »
      »
      »
      6 месяцев назад, # ^ |
      Rev. 2   Проголосовать: нравится +17 Проголосовать: не нравится

      128-bit (and higher) integers are indeed native types, though not in basic x86_64 and not equally general — they're supported in particular operations. There are two ways CPUs handle higher-precision operations to achieve faster performance:

      • one or more operands (can also include the output) are represented by 2 registers like you say here, leading to 2x precision — example is 32-bit ARM's SMULL which multiplies 2 32-bit integers to get a 64-bit result in 2 registers using only one operation
      • SIMD instructions that use their own registers — this is the case relevant here, as a __uint128_t variable can use a register %xmm0 with SSE instruction set; bitwise operations are typically provided in earliest versions of SIMD instruction sets as they're very simple, and performance overhead of loading to SIMD registers isn't a big deal as memory still needs to be accessed in new cache lines

      The performance difference depends on hardware and software, which is part of why $$$O(n/128)$$$ isn't valid $$$O$$$-notation.

      • »
        »
        »
        »
        6 месяцев назад, # ^ |
        Rev. 3   Проголосовать: нравится +8 Проголосовать: не нравится

        I don't think there are SIMD instructions that can be usable to directly compute 128-bit values, at least on x86-64. For example, there's no instruction that compute "add with carry" which is necessary to compute 128-bit additions.

        As I've shown in my experiment, GCC actually generates two adding instructions for a 128-bit addition (on x86-64). I tried various compile options but the compiler did not use SIMD at all. Similarly, a 128-bit multiplication expands to 3 multiplying instructions and 2 adding instructions, and a 128-bit division even becomes a library function call. It might be possible to use SIMD on bitwise operations, but GCC doesn't use SIMD for them, either. See this assembly output in Compiler Explorer.

        • »
          »
          »
          »
          »
          6 месяцев назад, # ^ |
            Проголосовать: нравится +8 Проголосовать: не нравится

          True, I was thinking specifically for bitwise operations, where the fact that SIMD is packed multiple data doesn't matter. There's no bitset addition in principle.

          Note that even some seemingly "elementary" bitset operations must be split into multiple assembly operations. Typical example: shifts, a massive pain in the ass to implement correctly.

  • »
    »
    4 месяца назад, # ^ |
      Проголосовать: нравится +28 Проголосовать: не нравится

    Better not to, it may get WA.

»
6 месяцев назад, # |
  Проголосовать: нравится +3 Проголосовать: не нравится

I wonder if using this can help pass some problems in $$$O(n^{2})$$$. Is this the case?

  • »
    »
    6 месяцев назад, # ^ |
      Проголосовать: нравится +8 Проголосовать: не нравится

    Yes. Bitsets in general help optimize $$$O(n^2)$$$ solutions

»
6 месяцев назад, # |
  Проголосовать: нравится +40 Проголосовать: не нравится

Username checks out

»
5 месяцев назад, # |
  Проголосовать: нравится +37 Проголосовать: не нравится

Can someone explain what shifting logic does it use? It's different from default bitset:

#include <bits/stdc++.h>
#include <tr2/dynamic_bitset>
using namespace std;
using namespace tr2;

int32_t main() {
    dynamic_bitset<> test(91);
    test[50] = true;
    cout << test << "\n";
    cout << (test << 78) << "\n";
    cout << "\n";
    
    bitset<91> test2;
    test2[50] = true;
    cout << test2 << "\n";
    cout << (test2 << 78) << "\n";
    return 0;
}

Output:

0000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000

0000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

  • »
    »
    5 месяцев назад, # ^ |
      Проголосовать: нравится +14 Проголосовать: не нравится

    It's interesting. I found this piece of code on gcc.gnu.org :

    do_left_shift

    It looks like they commented out the part of the code that was responsible for clearing the rightmost blocks... Indeed, you can duplicate bits. Try changing test[50] to test[0] in your code and you will see that the bitset is shifted, but the first 64 bit block is not cleared.

    The takeaway is not to use dynamic_bitsets, I guess.

  • »
    »
    5 месяцев назад, # ^ |
    Rev. 3   Проголосовать: нравится 0 Проголосовать: не нравится

    LEFT SHIFT works in ur testcase

    #define static_assert( ... )
    #include <tr2/dynamic_bitset>
    using bset = tr2::dynamic_bitset<__uint128_t>;
    
    • »
      »
      »
      5 месяцев назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится
      int32_t main() {
          bset test(91 + 64);
          test[50 + 64] = true;
          cout << test << "\n";
          cout << (test << (78 + 64)) << "\n";
          cout << "\n";
      
          bitset<91 + 64> test2;
          test2[50 + 64] = true;
          cout << test2 << "\n";
          cout << (test2 << (78 + 64)) << "\n";
          return 0;
      }
      

      Doesn't.

»
5 месяцев назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Auto comment: topic has been updated by bitset (previous revision, new revision, compare).

»
5 месяцев назад, # |
  Проголосовать: нравится +3 Проголосовать: не нравится

I think it's time u write ur own dynamic bitset

»
5 месяцев назад, # |
Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

find_previous can be implemented the same as find_next I think, however, it's best to customize your own bitsets.

»
3 месяца назад, # |
Rev. 2   Проголосовать: нравится +8 Проголосовать: не нравится

Ohh shit!!

I passed this solution illegally using dynamic_bitset.