Given an array A of N integers answer Q queries of type (L, R). Answer to a query (L, R) is the sum of distinct integers in the range A[L...R]
Constraints: N, Q <= 100000
0 <= L <= R < N
abs(A[i]) <= 1 000 000 000
I read somewhere that this can be done using persistence segment tree.
So, it would be helpful if one can explain that approach.
Other approach are welcome too.
I don't now the persistent tree solution,but I can tell you a nice offline solution. Let's use B[]=an array in which we can add/delete elemnts/query the sum in O(logN).So B can be basically a standard Fenwick Tree or Segment Tree. It is easy to understand that the sum of the distinct integers in the range [L,R] is the sum of the integers that appear for the last time in the sequence(we simply add each element the last time it occurs in the subsegment).We sort the queries by R and solve them offline.At each i (1<=i<=N) we insert on the position i of B[] the value of A[i].Let ist(1<=lst<i) the last occurence of A[i] until i.We delete the number for B[lst](we need only the last occurence before R for each query).After that,we solve all queries with R=i.If we query the sum on (L,i) each integer in the subsegment appears exactly once,so if we take the sum B[L,i],we have the sum of the distinct integers in the subsegment [L,i] of A.We answer each query in the sorted order like this then we output them in the right order. Note:For finding lst fast you can use map/unordered_map. PS.You can also come up with an O(NsqrtN+QsrtN) solution using Mo's algorithm,but I feel the first solution is easier to write and is definetely faster.
Imagine if you created a segment tree for every range of the array. To answer a query you just pick the segment tree corresponding to the range [L,R] and get your answer from the root of the tree. Query is O(1) but the construction would be O(N^3) which isn't feasible. What you can do is build a segment tree for every prefix of the array which can be done in O(NlogN). (This is what persistent segment tree is basically). Answer for a query [L,R] would just be answer for ans_for(R) — ans_for(L-1). Use coordinate compression initially to map the values in the range [1,N].
EDIT : ^This wouldn't work. Instead of just storing counts of values in segment tree node, store count of their rightmost occurences.
He wants the sum of distinct integers in the specified.range. Simply doing ans(R) — ans(L — 1) won't work here.
I dont see why not , we can traverse both the trees simultaneously like we do to find the Kth smallest number in a range. EDIT : My bad!
When you remove the contribution of distinct integer by subtracting ans_for(L-1), you are not simultaneously adding(if there are any) integers which are in the range L to R but were not counted because of ans_for(L-1)
say the array is A = [1,2,2,3,3,3,4,4,4,4] , first store the rightmost occurence of every index , for this array right_most_occur = [11,3,11,5,6,11,8,9,10,11] , where 11 is a way of storing that the right most occurence does not exist. Now the distinct integers in a range [L,R] are all those indexes i where right_most_occur[i] > R. Should be easy to figure out the persistent segment tree solution now.
A persistent segment tree is able to support queries of the form : Count the number of values
<= X
in the range[L, R]
. Using it, we can solve the given problem online (i.e without preprocessing the queries)Let's solve a different problem first using persistent segment tree: Number of distinct integers in a range. DQUERY
Let
prev[i]
denote the previous occurrence ofarr[i]
. Replace every element of your array by the location of it's previous index (i.e byprev[i]
).For, example
(1-based indexing) will now become
Now the answer for the number of distinct values in range
[L, R]
is simply the number of values in this new array<= L - 1
in the range[L, R]
. Why? Because if a value occurs more than once in the range, all its occurrences except the first one will haveprev[i] >= L
. So, by restricting ourselves to values<= L - 1
, we are only considering the first occurrence in the range.When adding index
i
of the array to your persistent segment tree, you need to add1
to indexprev[i]
of the persistent segment tree.Now, lets come back to your original problem. It can be solved in a similar fashion. You will need to create a persistent segment tree which supports weighted sum for all values
<= X
in range[L, R]
. (This is a trivial modification to the usual persistent segment tree. You may practice on this problem if you wish.)When adding index
i
of the array to your persistent segment tree, you need to add a weight ofarr[i]
to indexprev[i]
of the persistent segment tree.Time complexity :
O(N * log(N) + Q * log(N) )