iframe_'s blog

By iframe_, 18 hours ago, In English

I was shown this JOI problem by Jasonwei08 today. The problem statement boils down to:

Given an array $$$A$$$ of $$$2^L$$$ integers, $$$1 \leq L \leq 20$$$, process $$$1 \leq Q \leq 10^6$$$ queries. For each query, consisting of a length-$$$L$$$ pattern of either 0, 1, or ?, report the sum of all $$$A_j$$$ where the binary representation of $$$j$$$ matches the pattern.

For example, given the array $$$10, 2, 3, 15, 0, 9, 12, 3$$$, the query $$$?1?$$$ would match the binary indices $$$010, 011, 110, 111$$$, which are $$$2, 3, 6, 7$$$ in decimal. The result would be $$$A_2 + A_3 + A_6 + A_7 = 33$$$.

Let's start with a brute force approach. Build a perfect binary tree of all prefixes of length up to $$$L$$$, and traverse it downwards, summing the values at the leaves:

Notice that every time we "split", and take both children, the resulting subtrees are identical. We can use this to our advantage. Instead of subdividing into two separate searches, we can instead fold the two subtrees together, adding together the values at the leaves.

This optimization reduces the number of nodes traversed from $$$O(2^L)$$$ to $$$O(L)$$$, though processing each node isn't as simple anymore.

We can implement this by maintaining an initial array containing all of the values of the leaves, and at each tree-descent step, we create a new array by either:

  • if we descend to only one child, cut the array in half and pick the appropriate side.
  • if we split to both children, cut the array and overlay the two halves over each other, adding them together.

At the end, we obtain a length-one array whose only element is equal to the sum of all matching leaves.

This requires $$$O(2^L)$$$ space and time per search, which doesn't seem to be any better complexity-wise than brute force. However, we can now leverage another datastructure: a trie.

By building a trie out of all of the patterns, we can then run a downward search through the trie, at each step maintaining the array of remaining leaves. A depth-first search is ideal for this, as it guarantees that only one node at each depth can be processed at a time, and therefore provides a bound on memory usage.

Unfortunately, for this specific JOI task, the memory limit is too tight for a normal trie to work, even with very aggressive modifications. jay_jayjay suggested I sort and divide-and-conquer over the patterns as a sort of 'virtual trie', which worked!

(The intended approach to this task was in a very different direction, using SOS. Mine isn't even a very good one, I just thought the technique was cool.)

More generally, this works not only on binary strings, but on any pattern with a sufficiently low number of possible characters. It can also handle more complicated wildcards than ?, as long as subtrees can be efficiently folded together. However, it does require the full tree of strings to be stored in memory, limiting its applicability, unless some truly cursed sparse segtree magic happens...

That's about all for this weird little technique. I hadn't heard of it before, and neither had Jasonwei08, who seems to know every technique in existence, idk. So, I hope you enjoyed this, and maybe someday the perfect problem will come up where it can be used!

  • Vote: I like it
  • +75
  • Vote: I do not like it

»
16 hours ago, # |
  Vote: I like it +22 Vote: I do not like it

Thank you so much for sharing this technique with the community!!

Also, I do not know a lot of techniques, but I wish I do and I try to learn

Yayyyyy

  • »
    »
    16 hours ago, # ^ |
      Vote: I like it +22 Vote: I do not like it

    nah I swear you know infinite techniques

    • »
      »
      »
      96 minutes ago, # ^ |
      Rev. 2   Vote: I like it +2 Vote: I do not like it

      i can confirm he does know infinite techniques, many more than me

»
15 hours ago, # |
  Vote: I like it 0 Vote: I do not like it

orz idea, but how would you implement this? As in, how would you write a trie traversal that folds the tree?

  • »
    »
    15 hours ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    Surprisingly, it wasn't very difficult. I stored $$$L+1$$$ arrays of size $$$2^L, 2^{L-1}, ...$$$, the biggest one initialized to $$$A$$$. Then, when depth-first searching the trie of queries, I cut/folded the current top-level array into the next one. So, $$$10,2,3,15,0,9,12,3$$$ would be left-cut into $$$10,2,3,15$$$, for example. Naturally at the deepest point of the dfs there would be a single value in the smallest array, and that would be the answer to the queries ending at that point in the trie.

    Here's sort of what I mean:

    dfs(trie_node, depth, size):
        if trie_node is a leaf:
            for queries ending at trie_node:
                answer(level[depth][0])
    
        else:
            if trie_node has a `0` child:
                for x in range(size/2): level[depth+1][x] = level[depth][x]
                dfs(trie_node.child[0], depth+1, size/2)
    
            if trie_node has a `1` child:
                for x in range(size/2): level[depth+1][x] = level[depth][x+size/2]
                dfs(trie_node.child[1], depth+1, size/2)
    
            if trie_node has a `?` child:
                for x in range(size/2): level[depth+1][x] = level[depth][x]
                for x in range(size/2): level[depth+1][x] += level[depth][x+size/2]
                dfs(trie_node.child[?], depth+1, size/2)
    

    It's a little confusing because the dfs in this situation is on the trie, not the binary sequences, which aren't represented in an explicit tree at all. The trie is traversed top-down, whereas the sequence tree is traversed bottom-up.

    Also, I realize that 'folding' is a little misleading. It's more of an overlaying, as none of the children change order, as they might in a topological folding. I called it 'folding' because the way I was visualizing it was as if the leaves of the tree were a strip of paper and I were manipulating it to make different sections overlap.

    • »
      »
      »
      64 minutes ago, # ^ |
        Vote: I like it +3 Vote: I do not like it

      Damn, that's smart as hell. So this entire trie traversal would end up being something like O(L*2^L) right?

      • »
        »
        »
        »
        50 minutes ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Something like that. Theoretically if the trie traversal goes through every possible pattern, it works out to something like $$$2^L + 2^{L-1} \times 3 + 2^{L-2} \times 3^2 + \dots + 2 \times 3^{L-1} + 3^L$$$, because each node in the trie can have three children. But there is a bound on $$$Q$$$, the number of patterns, so I think we get something considerably closer to $$$L \times 2^L$$$.

        Either way, it pretty comfortably passed under time limit. :P

»
14 hours ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by iframe_ (previous revision, new revision, compare).

»
14 hours ago, # |
Rev. 2   Vote: I like it +3 Vote: I do not like it

This sounds like an abstracted version of SOS DP: https://mirror.codeforces.com/blog/entry/45223 Or maybe I am misunderstanding the difference.

  • »
    »
    14 hours ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I think I see what you mean — a pattern $$$P$$$ composed of 0s and ?s will match all binary sequences that are a subset of the binary string $$$P'$$$ formed by replacing all the ? in $$$P$$$ with 1. So, SOS can be seen as sort of similar to my approach.

    However, in SOS we compute the subset sum of every sequence. In my case there were too many patterns to construct, and so the cut-and-fold approach intentionally only builds the patterns it needs to.

    Also, cut-and-fold is theoretically able to support more complicated types of patterns. I don't believe SOS has an equivalent to patterns of 0, 1, and wildcard ?. An even more extreme example is sequences of decimal digits, where the character E stands as a wildcard for any even digit, and O for any odd digit. Maybe SOS can be adapted to support this; I'm not sure.

    Good connection, though! (And a helpful reminder that I should look more in-depth into SOS :p)

»
4 hours ago, # |
  Vote: I like it +3 Vote: I do not like it

Auto comment: topic has been updated by iframe_ (previous revision, new revision, compare).

»
3 hours ago, # |
  Vote: I like it +6 Vote: I do not like it

iframe orz

»
3 hours ago, # |
  Vote: I like it +6 Vote: I do not like it

Jasonwei08 common orz moment