crazySegmentTree: Segment Tree implementation with 5x faster queries

Revision en1, by purplesyringa, 2021-04-06 19:40:26

Hello, Codeforces!

A few days ago MohammadParsaElahimanesh posted a blog titled Can we find each Required node in segment tree in O(1)?. Apparently what they meant was to find each node in O(ans), according to ecnerwala's explanation. But I was too dumb to realize that and accidentally invented a parallel node resolution method.

A benchmark for you first, with 30 million RMQ on a 32-bit integer array of 17 million elements. It was run in custom test on Codeforces on Apr 6, 2021.

  • Classic implementation from cp-algorithms: 7.765 seconds, or 260 ns per query
  • Optimized classic implementation: (which I was taught) 4.452 seconds, or 150 ns per query (75% faster than classic)
  • Bottom-up implementation: 1.914 seconds, or 64 ns per query (133% faster than optimized)
  • Novel parallel implementation: 0.383 seconds, or 13 ns per query (400% faster than bottom-up, or 2000% faster than classic implementation) ## FAQ

Q: Is it really that fast? Idk, try it. Looks fast to me.

Q: Why? Maybe you want your O(nlog2n) solution to pass in a problem with O(nlogn) model solution. Maybe you want to troll problemsetters. Maybe you want to obfuscate your code so that no one would understand you used a segment tree so that no one hacks you (just kidding, you'll get FST anyway). Choose an excuse for yourself. I want contribution too.

Q: License? Tough question because we're in CP. So you may use it under MIT license for competitive programming, e.g. on Codeforces, and under GPLv3 otherwise.

Q: Any pitfalls? Yes, sadly. It requires AVX2 instructions which are supported on Codeforces, but may not be supported on other judges.

How it works

Segment tree

A range query is decomposed into red nodes. Classic segment tree implementations don't find these red nodes directly, but execute recursively on green nodes. Bottom-up segment tree implementation does enumerate these nodes directly, but it also enumerates a few other unused nodes.

The parallel implementation accesses

Tags #segment tree, #optimization, x86

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en9 English purplesyringa 2021-04-06 21:47:16 0 (published)
en8 English purplesyringa 2021-04-06 21:47:06 19 (saved to drafts)
en7 English purplesyringa 2021-04-06 20:47:42 0 (published)
en6 English purplesyringa 2021-04-06 20:47:11 14
en5 English purplesyringa 2021-04-06 20:46:15 5601
en4 English purplesyringa 2021-04-06 20:32:33 7243
en3 English purplesyringa 2021-04-06 20:19:44 753
en2 English purplesyringa 2021-04-06 19:41:23 42
en1 English purplesyringa 2021-04-06 19:40:26 2533 Initial revision (saved to drafts)