Блог пользователя himanshujaju

Автор himanshujaju, история, 8 лет назад, По-английски

Click here to download the pdf

Pre Requisites

Binary Search — How it works and where can it be applied!

Motivation Problem

We aim to solve this problem : Meteors

The question simply states :

There are N member states and M sectors. Each sector is owned by a member state. There are Q queries, each of which denote the amount of meteor shower in a [L, R] range of sectors on that day. The ith member state wants to collect reqd[i] meteors over all its sectors. For every member state, what is the minimum number of days it would have to wait to collect atleast the required amount of meteors?

Solution

The naive solution here is to do a binary search for each of the N member states. We can update in a range using segment trees with lazy propagation for each query. The time complexity of such a solution would be O(N * logQ * Q * logM). But this one will easily TLE.

Let's see if there's something we are overdoing. For every member state, the binary search applies all the queries until a point multiple times! For example, the first value of mid in the binary search is same for all member states, but we are unnecessarily applying this update everytime, instead of somehow caching it.

Let's do all of these N binary searches in a slightly different fashion. Suppose, in every step we group member states by the range of their binary search. In the first step, all member states lie in range [1, Q]. In the second step, some lie in range [1, Q / 2] while some lie in range [Q / 2, Q] depending on whether the binary search predicate is satisfied. In the third step, the ranges would be [1, Q / 4], [Q / 4, Q / 2], [Q / 2, 3Q / 4], [3Q / 4, Q]. So after logQ steps, every range is a single point, denoting the answer for that member state. Also, for each step running the simulation of all Q queries once is sufficient since it can cater to all the member states. This is pretty effective as we can get our answer in Q * logQ simulations rather than N * Q * logQ simulations. Since each simulation is effectively O(logM), we can now solve this problem in O(Q * logQ * logM).

"A cool way to visualize this is to think of a binary search tree. Suppose we are doing standard binary search, and we reject the right interval — this can be thought of as moving left in the tree. Similarly, if we reject the left interval, we are moving right in the tree.

So what Parallel Binary Search does is move one step down in N binary search trees simultaneously in one "sweep", taking O(N  *  X) time, where X is dependent on the problem and the data structures used in it. Since the height of each tree is LogN, the complexity is O(N  *  X  *  logN)." — animeshf

Implementation

We would need the following data structures in our implementation :

  1. linked list for every member state, denoting the sectors he owns.
  2. arrays L and R denoting range of binary search for each member state.
  3. range update and query structure for Q queries.
  4. linked list check for each mid value of current ranges of binary search. For every mid value, store the member states that need to be checked.

The implementation is actually pretty straight forward once you get the idea. For every step in the simulation, we do the following :

  1. Clear range tree, and update the linked lists for mid values.
  2. Run every query sequentially and check if the linked list for this query is empty or not. If not, check for the member states in the linked list and update their binary search interval accordingly.

Pseudo Code

for all logQ steps:
    clear range tree and linked list check
    for all member states i:
        if L[i] != R[i]:
            mid = (L[i] + R[i]) / 2
            insert i in check[mid]
    for all queries q:
        apply(q)
        for all member states m in check[q]:
            if m has requirements fulfilled:
                R[m] = q
            else:
                L[m] = q + 1

In this code, the apply() function applies the current update, i.e. , it executes the range update on segment tree. Also to check if the requirements are fulfilled, one needs to traverse over all the sectors owner by that member state and find out the sum. In case you still have doubts, go over to the next section and see my code for this problem.

Problems to try

  1. MeteorsMy AC Solution
  2. Make n00b land great again
  3. Travel in HackerLand
  4. SRM 675 Div1 500

Conclusion

I heard about this topic pretty recently, but was unable to find any good tutorial. I picked up this trick from some comments on codeforces blog on Meteors problem. Alternate solutions to the mentioned problems are also welcome, almost every question on parallel binary search can also be solved with some persistent data structure, unless there is a memory constraint. Feel free to point out errors, or make this tutorial better in any way!

Happy Coding!

  • Проголосовать: нравится
  • +294
  • Проголосовать: не нравится

»
8 лет назад, # |
  Проголосовать: нравится +38 Проголосовать: не нравится

I've heard about Meteors so many times. Thanks to you, today I've finally decided to solve it.

My implementation is slightly different. Maybe it will be easier for some people, so I'll describe it.

I wrote a recursive function rec(int low, int high, vector<int> owners) and I run it once as rec(1, q, {1,2,...,n}). Vector owners contains people for which answer is in interval [low, high].

Let's denote mid = (low + high) / 2. I want to know which owners already won after the first mid queries. Then, I run rec(low, mid, owners_who_won) and rec(mid, high, those_who_didnt).

To check "which owners already won" I need standard range tree. In my code below you can find a variable act_tree denoting the number of first queries summed in the tree. Let's say my function is run with arguments low = 70 and high = 90. My global tree should now contain the sum of first mid = 80 queries (so there should be act_tree = 80). Later I want to run rec(70, 80) and rec(80, 90). I need to do the following:

  1. I subtract queries 80, 79, 78, 77, 76.
  2. The tree contains the sum of first 75 queries (act_tree = 75). I can run rec(70, 80).
  3. I add queries 76, 77, ..., 85.
  4. The tree contains the sum of first 85 queries. I can run rec(80, 90).
  5. I subtract queries 85, 84, 83, 82, 81.

The time complexity is something like . Naive implementation will result in the memory complexity . To make it O(N + Q) you need to pass vector owners by reference and clear it after checking which owners already won. Check details in my code below.

code
  • »
    »
    8 лет назад, # ^ |
    Rev. 2   Проголосовать: нравится +6 Проголосовать: не нравится

    Cool implementation! Btw you managed to pass time limit using recursion and vectors? I found time limit to be pretty strict. Numerous optimizations were required even with BIT to get an AC.

    • »
      »
      »
      8 лет назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      Thanks. Cool blog ;)

      Yes. I didn't use BIT and I got AC even without passing vector by reference (so, with memory . I think that recursion isn't a problem here because we get operations in a tree. Without that, recursion is so it can be slower and it shouldn't affect the overall running time. Apparently, my solution may use less tree operations (e.g. two times less) than yours and it's why you had to use BIT. Or maybe for some other special test my solution would be slower, it's hard to say.

  • »
    »
    7 лет назад, # ^ |
    Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

    Hi, I have a question. Why do you set ans[who]=high when "who" can get his desired value in mid point.

    Update: clear.

  • »
    »
    6 лет назад, # ^ |
      Проголосовать: нравится +5 Проголосовать: не нравится

    owners.clear(); // thanks to this line the memory is O(n), not O(n log(n))

    I think without calling owners.shrink_to_fit() the memory is still O(nlog(n)), right?

    • »
      »
      »
      6 лет назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      Yes, you are right. Or we should clear it by swapping with an empty vector vector<int>().swap(owners);.

  • »
    »
    4 года назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    Hi,

    Can you point to some tutorial for the tree you're using ? I tried searching but couldn't find any.

    Thanks

»
8 лет назад, # |
Rev. 3   Проголосовать: нравится +43 Проголосовать: не нравится

I was discussing with satyaki3794 about his solution to Travel in Hackerland using parallel binary search. My first reaction when he told me about this technique was "Wow you don't have sequential code ? I have never submitted a parallel code online" :P
My bad. He then pointed me to the METEORS problem.

»
8 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Auto comment: topic has been updated by himanshujaju (previous revision, new revision, compare).

»
8 лет назад, # |
  Проголосовать: нравится +10 Проголосовать: не нравится

SRM 675 Div.1 LimitedMemorySeries1(500pts) can be solved with this method.

This is my code accepted in the practice room. http://ideone.com/Dm2x4M

»
8 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

My code is similar to yours, can you point out the error in this please? I am getting wrong answer. Code

»
8 лет назад, # |
  Проголосовать: нравится +23 Проголосовать: не нравится

Great Blog!

A cool way to visualize this is to think of a binary search tree. Suppose we are doing standard binary search, and we reject the right interval — this can be thought of as moving left in the tree. Similarly, if we reject the left interval, we are moving right in the tree.

So what Parallel Binary Search does is move one step down in N binary search trees simultaneously in one "sweep", taking O(N * X) time, where X is dependent on the problem and the data structures used in it. Since the height of each tree is Log N, the complexity is O(N * X * logN)

»
8 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

That's a cool trick :)

I tried to solve the problem "Make n00b land great again" but couldn't figure out how to efficiently update the queries. Can you please explain how to do it efficiently?

»
8 лет назад, # |
  Проголосовать: нравится +8 Проголосовать: не нравится

Thanks himanshujaju for a nice blog post. This helped me to learn a new technique :) ..Parallel BS is awesome..Anyway, at first while reading Problem description for METEORS , first thing that come to my mind was Persistence Segment-tree as this is a straightforward and implemented and got AC with 165M memory & 2.82 time, after that implemented ParallelBS (first time, of course :) ) and got AC with 22M & 4.72 time (huge time but less memory ;) ).....Then, after looking RANK for that problem, I wondered how some people managed to solve this problem in less time like 0.34,0.58,0.93...Is there any better approach?

  • »
    »
    8 лет назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    Can you share your approach for the Persistent Segment-tree solution?

    I did it with persistent segment tree by performing point-update of +a[i] on l[i] and -a[i] on r[i]+1 (for l[i]<=r[i], similar updates for l[i]>r[i]). But it got TLE without fast-io , and AC with a time of 7.72 s with fast-io.

    • »
      »
      »
      8 лет назад, # ^ |
        Проголосовать: нравится +3 Проголосовать: не нравится

      What we need here in segment-tree is "Range update Point query". It seems to require lazy here, but we may skip lazy by performing something like this: At the time of Range update we have to update the sum for all the segments which encounter while updating that range. And, now at the time of Point query we just have to go down to the point on segment-tree by adding up the sum of segments that encounter while going down to this point...Hope it helps :)

»
8 лет назад, # |
  Проголосовать: нравится +40 Проголосовать: не нравится

Lazy segment trees are actually not necessary for this problem. You can also do the line sweep technique to count how many meteors have been added to sector i for every i (in which events are endpoints of queries and sectors) to knock down a log off the complexity, making it .

code
  • »
    »
    8 лет назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    I couldn't understand your solution. Could you share how your code runs on the sample?

    • »
      »
      »
      8 лет назад, # ^ |
      Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

      I guess it's a lot closer to Errichto's implementation than the original one: rec(minq, maxq, events) performs all queries between minq and (minq+maxq)/2 and gets the number of meteors for every state.

      The way it performs the queries is by doing a line sweep: events is a sorted list of the endpoints of all queries in the range [minq, maxq] (KTYPE events), and for each starting endpoint I add ai to the current section's meteor count, for each ending endpoint I subtract ai. This way for every sector I know how many meteors it has for those queries.

      events also contains all states I'm searching in this query range (NTYPE events), and a list of sectors that those states own (MTYPE events).

      For the first step of the sample I would first call:

      • rec(minq=0, maxq=3, { states:{0,1,2,3}, sectors:{0,1,2,3,4}, queries:{0,1,2} })

      (minq+maxq)/2 is 1, so I'm running the first two queries and ignoring the rest. The last state succeeds, and the two first states fail, so the recursion splits into:

      • rec(minq=0, maxq=1, { states: {2}, sectors: {1, 4}, queries:{0,1})
      • rec(minq=2, maxq=3, { states: {0,1}, sectors: {0, 2, 3}, queries: {2} }).

      Because we will never consider queries 0 and 1 for states 0 and 1 anymore, I subtract how many meteors those states had from the required number of meteors before recursing.

      The second round of splits will produce:

      • rec(minq=0, maxq=0, { states: {2}, sectors: {1, 4}, queries:{0})
      • rec(minq=1, maxq=1, { states: {}, sectors: {}, queries:{1})
      • rec(minq=2, maxq=2, { states: {0}, sectors: {0, 3}, queries: {2} }).
      • rec(minq=3, maxq=3, { states: {1}, sectors: {2}, queries: {} }).

      And we can see that minq indicates the answer for every state in the corresponding list.

»
8 лет назад, # |
Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

i just solved the METEORS question.I was getting wrong answer because of the fact that i was not breaking out of loop when summing the value of all the sectors for a member.I was doing all calculations in long long in c++ . So why does this happen isn't the maximum number is 3*10^14 which long long can handle, on breaking out of loop as soon as my required is satisfied it got AC .Can anyone answer why?

»
8 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

A separate but related problem: can we run Q binary searches (e.g. Q lower_bound operations) over a sorted array of N elements faster than in O(Q * logN)? (It is assumed that all Q queries are known beforehand).

  • »
    »
    8 лет назад, # ^ |
    Rev. 2   Проголосовать: нравится +13 Проголосовать: не нравится

    You can first sort the Q queries in and then iterate over two arrays simultaneously in O(N + Q). So, the complexity is . But... is this complexity better? I don't see any example (like ).

    • »
      »
      »
      8 лет назад, # ^ |
      Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

      A curious case. Assuming the following calculation is correct, the complexity is not better https://www.wolframalpha.com/input/?i=solve+Q%2Alog%28N%29+%3C+Q%2Alog%28Q%29%2BN

    • »
      »
      »
      8 лет назад, # ^ |
      Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

      It looks like there is no way to improve the case Q=N. Otherwise we could sort faster than N*logN. Or at least we could sort several arrays faster. We could sort one of them in N*logN and others faster then N*logN (by doing parallel binary searches over the first already sorted array).

      • »
        »
        »
        »
        8 лет назад, # ^ |
          Проголосовать: нравится 0 Проголосовать: не нравится

        I tried this logic to prove that it's impossible to solve your problem faster — and I didn't succeed. So, I don't think you're right here. UpperBound isn't enough to sort the other array. For example, consider the first array to be {100,200,300,400,500} and a new array {17,29,18,50,10,30}. Still, maybe some similar proof will be correct.

  • »
    »
    8 лет назад, # ^ |
      Проголосовать: нравится +10 Проголосовать: не нравится

    Let assume the time complexity of that problem with N = Q is T(N). Furthermore, I do some technical assumptions. T(N) ≤ T(2N + a) / 2 for all N, a ≥ 0 and .

    I show that we can sort an array of N distinct elements in O(T(N)) expected time under the comparison model with randomization.

    1. If the input size is under some constant, we can sort it directly.
    2. From the input array of length N, we first randomly choose N / 2⌋ elements. Then, recursively sort the chosen elements. Let call the sorted array be B.
    3. We call the assumed algorithm with B and the N / 2⌉ unchosen elements so we can locate each unchosen elements to between adjacent elements of B.

    Because we chose randomly, and we assumed that the each element is distinct, each group (the elements with the same query result) is a constant size. We can thus sort each group in O(1) expected time.

    Let the runtime of the algorithm above be S(N). By the assumption on T(·), we can easily see that .

    Because we cannot sort an array in time under the comparison model even with randomization, we conclude that ■.

    Open problem: can this reduction be derandomized?

    • »
      »
      »
      8 лет назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      I don't think it's obvious that "each group is a constant size". It's logical/obvious that the sum of (where s is size of group) is O(n) in total but you didn't prove it. Still, your proof is nice, thanks.

      By the way, what is the expected size of the biggest of groups? It sounds like something very hard to analyze. Maybe some tedious calculations would be enough here.

  • »
    »
    8 лет назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    I finally realized what answer to this question I wanted. My initial thought was that when doing Q independent binary searches, we perform redundant work. And indeed there is a way to reduce it, though the worst case complexity will remain O(Q * logN).

    We can visualize binary search as a binary search tree traversal. We will do a standard tree traversal, but we'll process all queries together. To achieve this, we sort our queries and start from the root of the tree, which corresponds to the median of the array. By doing binary search on queries we find which queries lie to the left and which lie to the right of the root node. We continue traversing tree recursively. As a result, we reduce the number of comparisons due to partially coinciding search paths.

»
7 лет назад, # |
Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

Hey, can somebody plz help me in debugging my code for meteors. Code

P.S. — I also wrote an input generator and compared my solution to one of the accepted solutions, but could not find any mismatch.

»
7 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Selective Additions — A problem from the last Hourrank on Hackerrank. Can be solved using Parallel BS.

»
7 лет назад, # |
  Проголосовать: нравится +1 Проголосовать: не нравится

I think we can do all these questions with Persistent Segment Tree with Lazy Propagation.

Here is my code of Meteorshttps://ideone.com/PgJg1E
Complexity — O(M*logQ*logM)

One downside of this is that it takes a lot of memory.

  • »
    »
    7 лет назад, # ^ |
      Проголосовать: нравится +1 Проголосовать: не нравится

    I don't think it is possible to do Lazy Propagation in Persistent Segment Tree. Memory Complexity becomes O(QN), that isn't the purpose of Persistent Segment Tree.

    You may get AC in SPOJ, because in SPOJ you have 1.5GB Memory limit, and as I see, your code got AC with 1.3GB memory usages.

    But in the main contest of POI, Memory limit was 64MB ;)
    Try submitting here — SZKOpul — Meteors

    • »
      »
      »
      7 лет назад, # ^ |
      Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

      It is possible to do Lazy Propagation in Persistent Segment Tree. The Memory Complexity O(N + Q*logM). You can see in my code that while querying/updating I'm making at max O(logM) nodes only.

      Yes, it won't give AC if the memory limit is tight. I had already mentioned the downside.

      P.S. I was easily able to reduce the memory to 0.44GB. Just reduced the array size.
      Code — https://ideone.com/x7dwqq

      • »
        »
        »
        »
        7 лет назад, # ^ |
        Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

        I don't think that memory complexity will still be O(N + Q log M), because, when you propagate a lazy value, in the end you may end up creating a tree for that range. So memory complexity is something greater than O(Q log M) and Worst case O(QN).

        • »
          »
          »
          »
          »
          7 лет назад, # ^ |
            Проголосовать: нравится 0 Проголосовать: не нравится

          Please take a look at my code carefully. Try to run it for some cases. You'll know then :)

        • »
          »
          »
          »
          »
          7 лет назад, # ^ |
            Проголосовать: нравится 0 Проголосовать: не нравится

          you don't create a new tree for the range, you just update the node, it's still that memory complexity. If you used QN memory each time it wouldn't make any sense, because that would assume regular lazy propagation would be QN, since you access/create same number of nodes (log n).

»
7 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Please, update the link for "travel-in hackerland" question,

it's new link is this

»
6 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

How is the brute force complexity the one given above? For each station, we do binary search on queries and we keep 2 pointers, one for binary search and other for the queries pointer which points upto where the queries are applied.I suppose this pointer moves at max the whole queries array twice. So must not it be Q+LogQ instead of Q.LogQ. And then, we apply range query range update using BIT, so what I suppose is the complexity must not contain the factor of Q. Please correct me if I am wrong.

»
6 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Here's a problem related to the concept.

https://agc002.contest.atcoder.jp/tasks/agc002_d

»
5 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Here's another problem: Evil Chris

»
19 месяцев назад, # |
  Проголосовать: нравится +8 Проголосовать: не нравится

These tasks also solvable using Parallel Binary Search:

»
17 месяцев назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

I wonder if the "find k-th element in a range with updates" problem can be solved using the author's implementation. I solved it using the recursion style implementation but I don't think the same idea can be applied here?