This is my hand-unroll of the well known (?) 64-ary tree, or, "bitset" tree used for predecessor problems. I tried to squeeze a pretty rudimentary implementation of 200A - Cinema in $$$O(k \sqrt{k} \log_W(n))$$$ using my usual bitset tree template, but it fell pretty short (1.7s). Rewriting made it go down to just 0.6s, which is seen in these two submissions: 376029723, 376055924.
Code
This can probably beat the DSU implementation suggested (technically not in TC, since they treat $$$O(\alpha(n))$$$ as $$$O(1)$$$). At least it uses much lower memory.








