Persistence Segment Tree

→ Pay attention

Before contest
CodeTON Round 9 (Div. 1 + Div. 2, Rated, Prizes!)
5 days
Register now »

*has extra registration

→ Streams

DP Bitmask - Topic Stream

By Shayan

Before stream 14:42:21

View all →

→ Top rated

#	User	Rating
1	tourist	4009
2	jiangly	3823
3	Benq	3738
4	Radewoosh	3633
5	jqdai0815	3620
6	orzdevinwang	3529
7	ecnerwala	3446
8	Um_nik	3396
9	ksun48	3390
10	gamegame	3386

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	cry	166
2	maomao90	163
2	Um_nik	163
4	atcoder_official	161
5	adamant	159
6	-is-this-fft-	158
7	awoo	157
8	TheScrasse	154
9	nor	153
9	Dominater069	153

View all →

→ Find user

→ Recent actions

Detailed →

i_love_emilia_clarke's blog

Persistence Segment Tree

By i_love_emilia_clarke, history, 8 years ago, In English

Given an array A of N integers answer Q queries of type (L, R). Answer to a query (L, R) is the sum of distinct integers in the range A[L...R]

Constraints: N, Q <= 100000

0 <= L <= R < N

abs(A[i]) <= 1 000 000 000

I read somewhere that this can be done using persistence segment tree.

So, it would be helpful if one can explain that approach.

Other approach are welcome too.

persistent segment tree

i_love_emilia_clarke
8 years ago
8

Comments (7)

Show archived | Write comment?

Bodo

8 years ago, # |

← Rev. 2 →

I don't now the persistent tree solution,but I can tell you a nice offline solution. Let's use B[]=an array in which we can add/delete elemnts/query the sum in O(logN).So B can be basically a standard Fenwick Tree or Segment Tree. It is easy to understand that the sum of the distinct integers in the range [L,R] is the sum of the integers that appear for the last time in the sequence(we simply add each element the last time it occurs in the subsegment).We sort the queries by R and solve them offline.At each i (1<=i<=N) we insert on the position i of B[] the value of A[i].Let ist(1<=lst<i) the last occurence of A[i] until i.We delete the number for B[lst](we need only the last occurence before R for each query).After that,we solve all queries with R=i.If we query the sum on (L,i) each integer in the subsegment appears exactly once,so if we take the sum B[L,i],we have the sum of the distinct integers in the subsegment [L,i] of A.We answer each query in the sorted order like this then we output them in the right order. Note:For finding lst fast you can use map/unordered_map. PS.You can also come up with an O(NsqrtN+QsrtN) solution using Mo's algorithm,but I feel the first solution is easier to write and is definetely faster.

→ Reply

SarvagyaAgarwal

8 years ago, # |

← Rev. 3 →

Imagine if you created a segment tree for every range of the array. To answer a query you just pick the segment tree corresponding to the range [L,R] and get your answer from the root of the tree. Query is O(1) but the construction would be O(N^3) which isn't feasible. What you can do is build a segment tree for every prefix of the array which can be done in O(NlogN). (This is what persistent segment tree is basically). Answer for a query [L,R] would just be answer for ans_for(R) — ans_for(L-1). Use coordinate compression initially to map the values in the range [1,N].
EDIT : ^This wouldn't work. Instead of just storing counts of values in segment tree node, store count of their rightmost occurences.

→ Reply

pranet

8 years ago, # ^ |

He wants the sum of distinct integers in the specified.range. Simply doing ans(R) — ans(L — 1) won't work here.

→ Reply

SarvagyaAgarwal

8 years ago, # ^ |

← Rev. 2 →

-9

I dont see why not , we can traverse both the trees simultaneously like we do to find the Kth smallest number in a range. EDIT : My bad!

→ Reply

i_love_emilia_clarke

8 years ago, # ^ |

When you remove the contribution of distinct integer by subtracting ans_for(L-1), you are not simultaneously adding(if there are any) integers which are in the range L to R but were not counted because of ans_for(L-1)

→ Reply

SarvagyaAgarwal

8 years ago, # ^ |

-9

say the array is A = [1,2,2,3,3,3,4,4,4,4] , first store the rightmost occurence of every index , for this array right_most_occur = [11,3,11,5,6,11,8,9,10,11] , where 11 is a way of storing that the right most occurence does not exist. Now the distinct integers in a range [L,R] are all those indexes i where right_most_occur[i] > R. Should be easy to figure out the persistent segment tree solution now.

→ Reply

pranet

8 years ago, # |

+30

A persistent segment tree is able to support queries of the form : Count the number of values <= X in the range [L, R]. Using it, we can solve the given problem online (i.e without preprocessing the queries)

Let's solve a different problem first using persistent segment tree: Number of distinct integers in a range. DQUERY

Let prev[i] denote the previous occurrence of arr[i]. Replace every element of your array by the location of it's previous index (i.e by prev[i]).

For, example

1 2 1 2 3

(1-based indexing) will now become

0 0 1 2 0

Now the answer for the number of distinct values in range [L, R] is simply the number of values in this new array <= L - 1 in the range [L, R]. Why? Because if a value occurs more than once in the range, all its occurrences except the first one will have prev[i] >= L. So, by restricting ourselves to values <= L - 1, we are only considering the first occurrence in the range.

When adding index i of the array to your persistent segment tree, you need to add 1 to index prev[i] of the persistent segment tree.

Now, lets come back to your original problem. It can be solved in a similar fashion. You will need to create a persistent segment tree which supports weighted sum for all values <= X in range [L, R]. (This is a trivial modification to the usual persistent segment tree. You may practice on this problem if you wish.)

When adding index i of the array to your persistent segment tree, you need to add a weight of arr[i] to index prev[i] of the persistent segment tree.

Time complexity : O(N * log(N) + Q * log(N) )

→ Reply