bobr_babizon's blog

By bobr_babizon, 7 months ago, translation, In English

I hear a lot about this topic but could not find any resources in Russian or English, so I decided to translate an article by ko_osaga. This translation may contain many inaccuracies, and it lacks the links available in the original article.

This article presents a new segment tree, called the Kinetic Segment Tree. An element is considered Kinetic if it moves over time, i.e., if the element represents a linear or polynomial function. Segment trees are often found in competitions, as are Kinetic elements (e.g., the convex hull trick), so studying their combination can also be useful. It is also worth noting that Kinetic properties can be identified even in problems not directly related to Kinetic elements (e.g., using CHT in DP optimization), and consider how the Kinetic Segment Tree can be applied.

This data structure has already been mentioned in articles on Codeforces, but I recently found time for a more thorough study. At the time of writing, the Kinetic Segment Tree may be an obscure data structure. If you are interested in new data structures, I recommend reading it.

Often, tasks on classical data structures are difficult to solve, and it is usually easier to learn from ready-made materials. However, if you understand the solution correctly, it will give you the opportunity to study many concepts. The goal of this article is a deep understanding of Segment Tree Beats and learning about the Kinetic Segment Tree, as well as understanding the advantages and disadvantages of data structures and what tasks they can solve.

1. Segment Tree Beats

Segment Tree Beats is a technique that allows solving segment queries that cannot be solved by regular lazy propagation. It is a type of lazy propagation. One of the problems, HDU 5306 Gorgeous Sequence, which I heard from a foreigner, turned out to be unsolvable by regular lazy propagation, which seemed interesting to me. We discussed this problem for a long time in preparation for IOI 2016, and it turned out to be difficult even with a description in Chinese.

Six years later, this problem on the Baekjoon Online Judge was solved by more than 200 people, and there are now many good materials explaining Segment Tree Beats. I started studying this in 2020. Although there are other good articles, I often feel that it is worth repeating to better understand the proofs.

For example, if you need to update only one segment, the code might look like this:

void update(int s, int e, int ps = 1, int pe = n, int p = 1, int v) {
    if (e < ps || pe < s)
        return;
    if (s <= ps && pe <= e) {
        tree[p].maxv = min(tree[p].maxv, v);
        return;
    }
    lazydown(p);
    int pm = (ps + pe) / 2;
    update(s, e, ps, pm, 2 * p, v);
    update(s, e, pm + 1, pe, 2 * p + 1, v);
    pull(p);
}

Such code allows you to manage maximum values in the segment under the condition $$$A[i]=min(A[i], v)$$$. However, if you need to find the sum of the values in the segment after the update, it becomes more complicated. One way to solve this is to avoid updating the segment if the length of the segment is 1, then you can track changes in the sum. An example of such code:

void update(int s, int e, int ps = 1, int pe = n, int p = 1, int v) {
    if (e < ps || pe < s)
        return;
    if (ps == pe) {
        if (tree[p].maxv > v) {
            tree[p].sum -= (tree[p].maxv - v);
            tree[p].maxv = v;
        }
        return;
    }
    /* ... */
}

This code supports 3 types of queries, but it is slow. The complexity of the update function is $$$O(n)$$$ because all segment queries turn into point queries. However, if the update conditions are strong, like $$$s \leq ps \land pe \leq e$$$, the algorithm will finish quickly, but with limited operation capabilities. If the update conditions are weak, like $$$ps == pe$$$, the algorithm will be slow, but with high operation freedom. Segment Tree Beats finds the optimal balance, maintaining algorithmic complexity and increasing operation freedom. This is achieved through the use of smart techniques, such as:

void update(int s, int e, int ps = 1, int pe = n, int p = 1, int v) {
    if (e < ps || pe < s)
        return;
    if (s <= ps && pe <= e) {
        if (tree[p].maxv <= v) {
            // Do not update
            return;
        }
        // tree[p].maxv > v
        if (tree[p].smax < v) {
            tree[p].sum -= (tree[p].maxv - v) * tree[p].maxCnt;
            tree[p].maxv = v;
            return;
        }
        // tree[p].smax >= v, tree[p].maxv >= v do not terminate
    }
    /* ... */
}

smax (the second largest value) is the maximum value among elements that are less than the maximum value in the segment, and maxCnt is the number of times the maximum value occurs in the segment. The proof that this algorithm always finds the correct answer is not difficult to understand. It is important to understand why the algorithm works quickly. There are two types of proofs: one is simpler and based on the number of distinct values in the segments, and the other is more complex and uses the concept of a "tag".

Simple proof. Let $$$f([s,e])$$$ be the number of distinct values in the segment $$$[s, e]$$$. The sum of all $$$f([s,e])$$$ for all segments in the segment tree is initially $$$T(n) = T(n/2) + O(n)$$$, so $$$O(n \log n)$$$. If we consider the update process:

  • For nodes that fully satisfy $$$s \leq ps \land pe \leq e$$$:
  • Nodes that satisfy the conditions $$$tree[p].maxv \leq v || tree[p].smax < v$$$ do not change $$$f([s,e])$$$ and this also applies to their subtrees.
  • Nodes that satisfy the conditions $$$tree[p].maxv \geq v $$$ && $$$ tree[p].smax \geq v$$$, reduce $$$f([s,e])$$$ by at least one, because $$$maxv \neq smax$$$, but both become equal to $$$v$$$.
  • Nodes that do not satisfy the conditions $$$e < ps || pe < s$$$ are not considered.

  • For nodes that do not satisfy any of the conditions, a new value $$$v$$$ may appear in their subtrees, so $$$f([s,e])$$$ may increase by at most one.

Since there are $$$O(\log n)$$$ nodes that do not satisfy any of the conditions for each query, the sum of all $$$f([s,e])$$$ in the initialization and update process increases by $$$O((n+q) \log n)$$$. An update query affects $$$O(\log n)$$$ nodes, as well as all nodes that satisfy the conditions $$$tree[p].smax \geq v $$$ && $$$ tree[p].maxv \geq v$$$. Each visit to such nodes decreases $$$f([s,e])$$$ by at least one, so the total number of visits to such nodes through all queries is limited to $$$O((n+q) \log n)$$$.

Complex proof. Represent the maximum value as a tag. A node receives a tag in the following cases:

  • This node is the root.
  • The maximum value of this node and its parent node are different.

The tag attached to a node will be the maximum value of its subtree. By keeping the maximum values in the nodes of the segment tree, nodes with the same maximum values merge into one connected component, and the value is stored only in the root of this component. The second largest value of any node is the maximum value in the strict subtree (the subtree without the node). Let $$$f([s,e])$$$ be the sum of the number of tags in the subtree of the node $$$[s,e]$$$. Initially, this value is $$$O(n \log n)$$$.

Now let's observe the update. During the segment update, as a rule, each updated node will be assigned one tag. Since the number of nodes is $$$O(\log n)$$$ and the depth of each is also $$$O(\log n)$$$ (the number of tags increasing is equal to the number of ancestors of the new tag), the potential increases by $$$O(\log^2 n)$$$. Now consider the case where the second largest value is greater than or equal to the current updated value (so you need to recursively go down further). In this case, the tag providing this second largest value will obviously disappear. So with each recursive descent, the potential decreases. Thus, the sum of $$$f([s,e])$$$ increases by $$$O((n+q \log n) \log n)$$$, and each query, descending naively, decreases this value by 1 or more. So the time complexity becomes $$$O((n+q \log n) \log n)$$$.

Some points to consider:

This proof can be slightly modified to prove $$$O((n+q) \log n)$$$ — for details, see the article "A simple introduction to 'Segment tree beats'". Here we explained with the addition of $O(\

log n)$, but this proof is actually an improvement of the simple proof. A simpler way to understand this is not to think about $$$f([s,e])$$$, but simply about the number of tags. To remove one tag, it takes $$$O(\log n)$$$ time. Initially, there are $$$n$$$ tags, and each query adds $$$O(\log n)$$$. So a total of $$$O(n + q \log n)$$$ is added. The process of naively removing tags, if $$$k$$$ tags have been removed, will take a maximum of $$$k \log n$$$ time. So $$$O((n+q \log n) \log n)$$$.

This proof can be used even if the queries become more complex, for example:

$$$A[i]=min(A[i],Y)$$$ $$$A[i]=A[i]+X$$$

The operation $$$A[i] = A[i] + X$$$ also obviously adds $$$O(\log n)$$$ tags because we do not use different numbers but use the concept of a tag, which allows us to adapt the proof for a greater variety of operations. By the way, $$$f([s,e])$$$ is often called a potential function, which is often found in the analysis of complex algorithms. One of the simplest examples of using a potential function seems to be Segment Tree Beats.

2. Kinetic Segment Tree (Kinetic Tournament)

This section is based on the following article. Kinetic Segment Tree (KST) is a new data structure that initially sets at the initial time $$$t=0$$$ and supports the following queries:

update(i, a, b): assigns $$$(A[i], B[i]) = (a, b)$$$. query(s, e) $$$min_{i \in [s, e]} (A[i] \times t + B[i])$$$. heaten(T): sets the time to $$$t = T$$$, if $$$t \leq T$$$.

In a broad sense, the Kinetic Segment Tree is quite flexible. For example, the controlled functions do not necessarily have to be linear; it is sufficient that their relationship changes no more than $$$O(1)$$$ times depending on $$$t$$$. The concept of heaten also does not necessarily have to be global; it can be considered in the context of individual segments only. In this article, we will consider the above definition, and generalization can be considered in exercises.

If we consider only the narrow KST mentioned above, this algorithm can replace some well-known data structures in certain cases. For example, if all queries increase monotonically, KST can be considered as a Li-Chao tree supporting segment queries. Comparison of functions:

Fully dynamic? Segment queries Monotonicity of queries Type of functions
Li-Chao Tree Insertion only Not required All functions whose relationship changes no more than $$$O(1)$$$ times depending on $$$t$$$
Kinetic Segment Tree Insertion + deletion Required All functions whose relationship changes no more than $$$O(1)$$$ times depending on $$$t$$$

Technically, the Kinetic Segment Tree is similar to Segment Tree Beats. Like other segment trees, the leaf nodes store the functions $$$A[i]x+B[i]$$$, and the non-leaf nodes store the functions minimizing $$$A[i]t+B[i]$$$ among the subtree nodes. With a fixed $$$t$$$, this is easy to compare, and segment queries and updates are not a problem. Thus, the update and query functions of the Kinetic Segment Tree work in $$$O(\log n)$$$.

Now consider the heaten function, which corrects the internal information of the data structure as $$$t$$$ increases. If some non-leaf node compares two functions of its subtree, and the moment comes when their relationship changes at $$$t$$$ greater than the current time $$$t$$$, define this time as $$$insec(v,t)$$$. Let $$$melt[v]$$$ be the minimum of all $$$insec(v,t)$$$ in the subtree $$$v$$$. In this case, if $$$melt[v]$$$ is less than or equal to the current time, we need to recalculate the priorities of all such nodes, starting from the root and moving up.

That's all you need to know about this algorithm. It is not difficult to understand that it will always find the correct answer, but let's analyze the time complexity of the algorithm.

Theorem 1. Suppose there are no updates. Then the heaten function of the Kinetic Segment Tree requires a total of $$$O(n \log^2 n)$$$ operations. Let's prove this. Consider any segment $$$[s,e]$$$. Create a Lower envelope only from the lines in this segment. The maximum number of lines forming the Lower envelope is $$$e - s + 1$$$, and the number of changes in the minimum line does not exceed $$$e - s$$$. For the node $$$v = [l, r]$$$, define $$$f(v)$$$ as the number of remaining changes in the minimum line in the segment $$$[l, r]$$$ and $$$\Phi(v)$$$ as the sum of $$$f(v)$$$ over the entire subtree $$$v$$$. If the segment length is $$$n$$$, then $$$f(v) = O(n)$$$, and $$$\Phi(v) = O(n \log n)$$$. Thus, the total size of $$$\Phi(v)$$$ at the initial moment is $$$O(n \log^2 n)$$$. Each time the heaten function visits the node $$$v$$$, one of the intersections in the subtree $$$v$$$ disappears, so $$$\Phi(v)$$$ decreases by 1. Thus, the number of visits required by the heaten function does not exceed the sum of $$$\Phi(v)$$$, which is $$$O(n \log^2 n)$$$.

Remark. This analysis uses the potential function in a fairly strict sense. Here is a simpler explanation. The size of the Lower envelope of each node is $$$O(n)$$$, so the total number of intersections we are dealing with is $$$O(n \log n)$$$. Each intersection requires visiting all nodes on the path to the root, so processing each point takes $$$O(\log n)$$$. Multiplying this, we get $$$O(n \log^2 n)$$$.

Theorem 2. If there are updates, the heaten function of the Kinetic Segment Tree requires a total of $$$O(n \log^2 n \alpha(n))$$$ operations, where $$$\alpha(n)$$$ is the inverse Ackermann function. Let's prove this. If you look at the timeline, each update turns the objects controlled by the KST from lines into segments. You can assume that each update destroys the current line, creating a new point at the right end and a new line at the left end. Thus, when we create the Lower envelope from $$$n$$$ segments, the maximum number of changes in the minimum segment that can occur is limited to $$$O(n \alpha(n))$$$, and there is actually a construction that changes $$$\Omega(n \alpha(n))$$$ times. For details, see the article "Davenport–Schinzel Sequences and Their Geometric Application". Thus, using the proof of Theorem 1, we get the specified result.

Theorem 3. If the objects controlled by the Kinetic Segment Tree are not linear functions, but two different objects can intersect at most $$$s$$$ times, the heaten function requires a total of $$$O(\lambda_{s + 2}(n) \log^2 n)$$$ operations, where $$$\lambda_s(n)$$$ is the $$$n$$$-th member of the Davenport–Schinzel sequence of order $$$s$$$. Theorem 3.1. If the update function of the Kinetic Segment Tree supports only insertion or only deletion operations, the heaten function requires a total of $$$O(\lambda_{s + 1}(n) \log^2 n)$$$ operations. Here, "empty function" means $$$f(x) = \infty$$$. Theorem 3.2. If the update function of the Kinetic Segment Tree is not called, the heaten function requires a total of $$$O(\lambda_s(n) \log^2 n)$$$ operations.

Remark. The values of the Davenport–Schinzel sequence for small $$$s$$$ are as follows (Wikipedia):

$$$\lambda_0(n) = 1$$$ $$$\lambda_1(n) = n$$$ $$$\lambda_2(n) = 2n-1$$$ $$$\lambda_3(n) = 2n\alpha(n) + O(n)$$$ $$$\lambda_4(n) = O(n2^{\alpha(n)})$$$ $$$\lambda_5(n) = O(n2^{\alpha(n)}\alpha(n))$$$

Thus, if $$$s = O(1)$$$, the Davenport–Schinzel sequence is actually equal to $$$O(n)$$$. Exact calculations are not provided, but if we assume that $$$\alpha(n) \leq 4$$$, then $$$\lambda_s(n)$$$ will be about $$$2^s n$$$. The

orems 1, 2, 3 show that to reach the worst case, each node of the segment tree must have many intersections, and the heaten function must list them all. In fact, constructing such a counterexample for KST is not obvious, since the function $$$\alpha(n)$$$ is not so large, so regardless of the presence of update queries, one can assume that the complexity is about $$$O(\log^2 n)$$$.

3. Implementation and problem solving with KST

3.1. Solving a problem using the Convex Hull Trick

BOJ 12795. The "Land Capture by Half-Plane" problem is used as an example. The task is to quickly compute the maximum $$$ax+b$$$ for a given query $$$x$$$ when inserting a segment of the form $$$ax+b$$$. The regular Conv

ex Hull Trick cannot solve this problem, and four methods are usually used:

Using the Li-Chao tree: $$$O(Q \log Q)$$$.

Managing the upper hull of segments using Set (LineContainer): $$$O(Q \log Q)$$$.

Using the Bentley-Saxe technique for incremental Convex Hull Trick: $$$O(Q \log^2 Q)$$$.

Offline query processing, after which the regular Convex Hull Trick is managed using a segment tree: $$$O(Q \log^2 Q)$$$.

This problem can be solved using the Kinetic Segment Tree offline in $$$O(Q \log^2 Q)$$$. All segments are initially entered, after which the Kinetic Segment Tree is built for the segments. Each query asks for the maximum $$$ax+b$$$ for the prefix of the segments. Queries are sorted in ascending order of $$$x$$$ and processed in this order, allowing the Kinetic Segment Tree to be used.

Code for solving BOJ 12795 using the Kinetic Segment Tree, written by me (including unwritten update code). The implementation is not that complicated. Other problems that can be solved with the Kinetic Segment Tree:

Facebook Hacker Cup 2020 Round 2 Problem D: Log Drivin' Hirin'

Yosupo Judge: Line Add Get Min

BOJ. Beginning of the pilgrimage

The first two problems from the examples are actually simpler and faster solved by regular CHT structures. This means that the only problem where it is optimal to use segment management KST is the last problem, which may seem a bit boring. Let's look at other problems that can be solved with KST.

3.2. Heaten operation on segments

Now we explain the operation that supports heaten on segment queries. Specifically, we will construct a data structure that supports the following queries for two arrays of the same length $$$[a_1, \ldots, a_n], [b_1, \ldots, b_n]$$$:

Maximum on the segment: $$$max_{i \in [s, e]} b_i$$$

Heaten on the segment: $$$b_i := b_i + t \times a_i$$$ for $$$i \in [s, e], t > 0$$$.

Using the principles of KST to support this operation is not difficult. The main principle is that if the order of elements is violated, we simply naively solve the problem. The maximum on the segment is processed exactly the same as before. In the case of the heaten operation on the segment, we explore all corresponding segments for which $$$melt[v] \leq t$$$, and change their order; if $$$melt[v] > t$$$, this segment is processed using Lazy propagation. If the naive solution due to the heaten operation does not increase too much, then the time complexity ultimately will not differ much from the usual KST. In fact, the difference is not that great, and here's why.

Theorem 4. If you use the Kinetic Segment Tree to handle the above queries, the problem can be solved in $$$O((n + q \log n) \log^2 n)$$$. Let's prove this. Assume that the heaten operation applies only to the entire array. For each leaf node, represent its region as the set of nodes where the value $$$b_i$$$ of this leaf is maximal. The region of the leaf $$$x$$$ includes $$$x$$$ and the path to its parent. When the heaten operation occurs, one of the two subtrees may change so that the leaf $$$y$$$ receives a better value than the leaf $$$x$$$. In this case, $$$x$$$ loses its region. This happens because $$$a_x < a_y, b_x < b_y$$$, so the leaf $$$x$$$ will never return this region: the value of the leaf $$$y$$$ will always be greater. Thus:

  • The initial size of the region is $$$O(n)$$$.

  • Each leaf can get a maximum of $$$O(\log n)$$$ of the region.

  • The heaten operation inevitably reduces the region of the leaf, and this process is irreversible, each such operation takes $$$O(\log n)$$$ time.

  • All leaves together can lose a maximum of $$$O(n \log n)$$$ of the region. The loss operations take $$$O(\log n)$$$ time, so the problem is solved in $$$O(n \log^2 n)$$$.

Now assume that the heaten operation applies only to the segment. For nodes fully contained in the segment, the operation still does not affect the order. For subtrees that do not overlap with the segment at all, nothing happens. So only $$$O(\log n)$$$ nodes partially overlapping with the segment can change. Even if the region of these nodes was decreasing, the current update allows them to rise to the root. Thus, $$$O(\log n)$$$ leaves get $$$O(\log n)$$$ regions, so the leaf can get a total of $$$O((n + q \log n) \log n)$$$ regions. According to this logic, the problem is solved in $$$O((n + q \log n) \log^2 n)$$$.

By the way, Theorem 4 includes Theorem 1. The relationship between the "simple proof" and the "complex proof" in Segment Tree Beats is similar: it is the same proof, but this variant is slightly more complex and proves a bit more. Also, Theorem 4 does not make big assumptions about the heaten operation, so it can also support several other operations on segments (operations with $$$t < 0$$$ cannot be proven, as changes should occur only inside and outside the subtrees, and with $$$t < 0$$$, the order changes inside the subtree).

The following problems can be solved using this method:

Codeforces 1178G. The Awesomest Vertex

Codeforces 573E. Bear and Bowling

BOJ. Unyielding Heart 3

BOJ. Sequence and Queries 41

Original: https://koosaga.com/307 [Koosaga:Tistory]

Full text and comments »

  • Vote: I like it
  • +54
  • Vote: I do not like it