Wind_Eagle's blog

By Wind_Eagle, history, 2 years ago, translation, In English

Hello, Codeforces! I have wanted to write an educational blog for a long time, but I couldn't find a topic for it. And recently I remembered that Codeforces doesn't have a good blog on one of my favorite topics: the "divide and conquer" technique. So I want to talk about it. The outline will be something like this:

1) Divide and conquer. What this technique is and what it is for. Example.

2) Dynamic connectivity offline with divide and conquer. Divide and conquer dp optimization.

3) Let me tell you about a method I invented myself: divide and conquer by queries. In this case, we divide not only the array, but also the queries into groups.

So, here we go.

First, about what this technique is. The point is about the following: let's say we have an array. Then, if we know how to calculate the answer for two parts of it, and if we know how to get the answer for the whole array from the answers of two parts, we can apply the divide and conquer technique. Let me show you the simplest example: merge sort.

In merge sort we have an array of $$$n$$$ elements, for simplicity, numbers. To sort the subarray $$$[l ... r]$$$, we sort the subarrays $$$[l ... m]$$$ and $$$[m + 1 ... r]$$$, where $$$l \le m \le r$$$, then merge the two sorted arrays into one. Let's see how to do this in the easiest way.

In fact, let us have two arrays. We know that they are sorted, for example, by non-decreasing. Then to merge the two arrays into one, we can apply the following algorithm:

1) If the first array is empty, add the second array to the array with the result and quit. 2) If the second array is empty, add the first array to the array with the result and quit. 3) If both arrays are not empty, then take the smallest of the first two elements of the arrays, add it to the answer and remove it from the original array.

Of course, we should not remove from the beginning of the array, but it is much better to store a pointer to the current element we are considering. If you write in C++, I recommend using the std::merge function instead of this handwritten algorithm. Still, sometimes it is useful to understand how it works.

Now we will act recursively: write a function sort(l, r) that sorts the segment from l to r:

sort(l, r):
    if l == r:
        return
    m = (l + r) / 2
    sort(l, m)
    sort(m + 1, r)
    merge(l, m, m + 1, r)

So now let's estimate the running time of such sorting. It seems logical to divide the array roughly in half, that is, $$$m = \lfloor \frac{l + r}{2} \rfloor $$$. Let's see how many operations our algorithm can perform. For simplicity, let's put the number of array elements $$$n = 2^k$$$. Then it is clear that combining subarrays of size $$$1$$$ will take a total of $$$O(n)$$$ operations, combining subarrays of size $$$2$$$ will take $$$O(n)$$$ operations, then $$$4$$$, $$$8$$$, and so on.

From this you can see that the algorithm will perform $$$O(n)$$$ actions when combining each of the "layers", and since there are $$$k$$$ such layers, it works for $$$O(nk) = O(n \log n)$$$ in total. This construction resembles a segment tree. In fact, it almost is. For example, if in a segment tree each vertex has a subarray corresponding to that vertex, then the amount of memory spent on it will be $$$O (n \log n)$$$, and so will the construction time (you can prove it by yourself as an exercise).

Some other algorithms, such as dynamic connectivity offline, are also based on the divide and conquer method. The task is to handle such requests offline:

1) Add an edge to the graph.

2) Remove an edge from the graph.

3) Check if two vertices lie in the same connected component.

This topic may be a bit complicated for beginners, so if it is difficult, you can skip it. Since I could not quickly find English-language materials on this method, I will describe it here, although it is quite well known.

The problem is not very difficult to solve using SQRT decomposition, but I will tell the solution in $$$O(q \log n \log q)$$$.

Suppose the graph has $$$n$$$ vertices and $$$q$$$ queries, moreover, the graph is initially empty. If this is not the case, then simply add first type queries to the beginning.

Now let's define for each edge a lifetime: a continuous interval of request numbers during which the edge exists. Each check request corresponds to some time: essentially, just its input number.

Now let us build such a segment tree: it will store the numbers of edges. Initially this segment tree is empty. Now for an edge with lifetime [l ... r] write this addition to the segment tree:

add(v, tl, tr, l, r):
    if l > r:
        return
    if tl == l && tr == r:
        tree[v].insert(edge)
        return
    tm = (tl + tr) / 2
    add(v * 2, tl, tm, l, min(r, tm))
    add(v * 2 + 1, tm + 1, tr, max(l, tm + 1), r)

That is, we wrote a segment tree that is very similar to the one that assigns to a segment, only without the lazy propagation. So what did we end up with? You can see that now the segment [l ... r] turns out to be divided into $$$O(\log n)$$$ segments from the segment tree. Let us now run the DFS in this way:

dfs(v, tl, tr):
    if tl == tr:
        answer_query(tl)
        return
    tm = (tl + tr) / 2
    edge_set.insert_edges(tree[v])
    dfs(v * 2, tl, tm)
    dfs(v * 2 + 1, tm + 1, tr)
    edge_set.erase_edges(tree[v])

Now notice that when we have come to the leaf corresponding to the segment [w ... w], the $$$edge$$$_$$$set$$$ will contain those and only those edges that were "alive" at time $$$w$$$, so in fact we will get the list of edges that were at time $$$w$$$.

So, if we know how to quickly respond using a list of edges, we would have a ready solution. Unfortunately, we can only add edges to a regular DSU, and with this solution we still need to remove them. That is why we need to use a persistent DSU. Don't be afraid, it isn't complicated. Let's write a regular DSU with, for example, rank heuristics, or size heuristics. Then every time an edge is added, it just changes the immediate ancestor of one of the vertices, and changes the rank/size of the subtree of one of the vertices. Let's add to a special array after each added edge information about which vertex and what changes were made. Then we can cancel the last added edge for $$$O(1)$$$. In principle, this is enough for us.

The asymptotic evaluation is now clear: adding one edge to the segment tree takes $$$O(\log q)$$$ operations, and the whole dfs will take $$$O(q \cdot t_1 + S \cdot t_2)$$$, where $$$t_1$$$ is time to respond to the request, $$$S$$$ is the total number of edges stored in the segment tree, and $$$t_2$$$ is time to add and remove one edge. Given that $$$t_1 = O(\log n)$$$, $$$S = O(q \log q)$$$, and $$$t_2 = O(\log n)$$$, we come to the required asymptotics.

A task on this topic: https://mirror.codeforces.com/problemset/problem/813/F

Now briefly mention DP optimization divide and conquer. It allows you to reduce the running time of some DPs under certain conditions. You can read about it in English, for example, here: https://cp-algorithms.com/dynamic_programming/divide-and-conquer-dp.html

A task on this topic: https://mirror.codeforces.com/problemset/problem/868/F

Now let's get to the fun part: divide and conquer by queries. I came up with this method on my own while solving a problem (more on that later). I don't know if I'm the first person to come up with this method, but honestly, I haven't seen it before. If it's been described somewhere before me, please write me a comment about it.

This method is somewhat similar to the segment tree that stores queries, but it seems to me somewhat easier to write and understand.

Let's solve the following problem: https://mirror.codeforces.com/problemset/problem/763/E

Let's proceed this way: first we build a divide-and-conquer on an array. We can use it to construct a DSU for each segment in question, and we can also find the answer at the prefix and suffix of each segment: indeed, this is just the number of different connectivity components on the segment. We can do this by adding one vertex at a time to the DSU.

But how do we respond to requests? This is where divide and conquer by queries comes into play. When considering segment [l ... r], let us answer only such queries [ql ... qr] that $$$ql \le m < qr$$$, that is, only such queries whose left bound lies on segment [l ... m] and whose right bound lies on segment [m + 1 ... r]. Clearly, other queries will be considered either in [l ... m] or in [m + 1 ... r].

Now it becomes easy to answer such queries: we can take two DSUs, connect them into one, and add edges on the boundaries. Since $$$k$$$ is small, there will be few such edges.

Thus, we got a pretty fast and nice solution to this problem. You can see that it doesn't use a query-based segment tree.

Finally, let's look at another problem that can be solved using this method: https://mirror.codeforces.com/gym/101590/problem/D

Unfortunately, it only has a Russian statement, so I'll tell you what you need to do:

An array of $$$n$$$ triples of numbers is given, as well as the number $$$D$$$. Also given $$$q$$$ of queries with segments $$$l_i$$$ and $$$r_i$$$. You need to choose one number from each triplet on the segment, so that the sum is as large as possible, but divisible by $$$D$$$. For example, if the triples (1, 2, 3), (4, 5, 6) and (1, 7, 8) are given, and $$$D = 8$$$, then you can choose $$$3$$$, $$$6$$$ and $$$7$$$.

$$$n \le 50,000, q \le 300,000, D \le 50$$$.

From this problem it is clear why a segment tree is not always better. In fact, let's try to write a segment tree, each vertex of which will have a DP on that segment. But then let's estimate the running time. Clearly, the DP will be one-dimensional: $$$dp[r]$$$, where $$$r$$$ from the word remainder. It is not difficult to calculate for each remainder what the maximal sum will be.

But here's the problem: when it comes to the segment tree, it turns out that the union of two DPs works for $$$O(D ^ 2)$$$, and the segment tree simply will not pass!

But here we can apply divide and conquer to the queries: we will separately calculate DPs for the left half, and separately calculate DPs for the right half. We will store DPs for each suffix of the left half and for each prefix of the right half (remember, DPs are arrays of $$$D$$$ numbers). We will answer only those queries that intersect both segments. To do this, simply take the DP of the corresponding suffix, and the DP of the corresponding prefix, and go through the remainder $$$r$$$ of the prefix (the remainder of the suffix will equal $$$D - r$$$). Thus, we got the solution for $$$O(q \cdot D + n \cdot \log n \cdot D)$$$.

I hope you enjoyed this blog. This is my first educational blog, and I have tried to write as clearly as possible. If you still have questions, feel free to leave them in the comments.

Thanks gepardo and Dmi34 for checking the blog.

  • Vote: I like it
  • +93
  • Vote: I do not like it

»
2 years ago, # |
  Vote: I like it 0 Vote: I do not like it

If I understand correctly, divide and conquer by queries is already on USACO Guide.

»
2 years ago, # |
  Vote: I like it +8 Vote: I do not like it

divide and conquer by queries is definitely a very well known thing known as disjoint sprase table. anyways i don't understand how it is different from regular divide and conquer that it requires a new name.

  • »
    »
    2 years ago, # ^ |
      Vote: I like it +28 Vote: I do not like it

    Hello! Thank you for your response.

    I didn't know that it is a well-known thing. I came up with this thing by myself, and when I asked some of my orange friends, they told mt that they have never heard about this before.

    Now I know that this is a well-known thing, thanks.