kien_coi_1997's blog

By kien_coi_1997, 10 years ago, In English

I'm going to talking about a coding style in segment tree which I used for a long time. Not only is it more systematically, but it also support more kind of segment tree.

1. Old style

Assume that we are making a segment tree support operators assign elements in a range (assign-range L R X) and get max element in a range (max-range L R).

I found a large number of people use the following way to build segment tree:

void lazy_update(int u, int ll, int rr) {
  if (Lazy[u]==-1) return; // assume that -1 is unused value
  if (ll!=rr) Lazy[u*2]=Lazy[u*2+1]=Lazy[u];
  Max[u]=Lazy[u]; Lazy[u]=-1;
}

void assign_range(int u, int ll, int rr, int L, int R, int X) {
  lazy_update(u, ll, rr); // call a function to lazy-update node u (which operate segment ll..rr)
  if (ll>R || L>rr) return;
  if (L<=ll && rr<=R) { 
    Lazy[u]=X; 
    lazy_update(u, ll, rr); 
    return; 
  }
  assign_range(2*u, ll, (ll+rr)/2, L, R, X);
  assign_range(2*u+1, (ll+rr)/2+1, rr, L, R, X);
  Max[u] = max(Max[2*u], Max[2*u+1]);
}

int max_range(int u, int ll, int rr, int L, int R) {
  lazy_update(u, ll, rr);
  if (ll>R || L>rr) return -oo;
  if (L<=ll && rr<=R) return Max[u];
  int Max1 = max_range(2*u, ll, (ll+rr)/2, L, R);
  int Max2 = max_range(2*u+1, (ll+rr)/2+1, rr, L, R);
  return max(Max1, Max2);
}

int main(){
  ...
  cin >> L >> R >> X;
  assign_range(1, 1, n, L, R, X);
  ...
  cin >> L >> R;
  cout << max_range(1, 1, n, L, R) << endl;
  ...
}

There are some duplicated code which need to be reduce:

  • int u, int ll, int rr : appear in both three function declarations
  • lazy_update(u, ll, rr); : appear in beginning of both two functions
  • (2*u, ll, (ll+rr)/2 and 2*u+1, (ll+rr)/2+1, rr : they appear four times.

They will be reduced efficienly in new style.

2. New style

Forcing people to use a personal style is a bad idea. I share with you this style to give you a suggestion in coding segment tree.

struct node {
  int ll, rr, id;

  node(int L, int R, int X) { 
    ll=L, rr=R, id=X; 
    lazy_update(); 
  }

  node left() 
    { return node(ll, (ll+rr)/2, id*2); }
  node right() 
    { return node((ll+rr)/2+1, rr, id*2+1); }

  void lazy_update() { 
    if (Lazy[id]==-1) return; // assume that -1 is unused value
    if (ll!=rr) Lazy[id*2]=Lazy[id*2+1]=Lazy[id];
    Max[id]=Lazy[id]; Lazy[id]=-1;
  }
  
  void assign_range(int L, int R, int X){ // don't need to call lazy_update() at the beginning
    if (ll>R || L>rr) return ;
    if (L<=ll && rr<=R) 
      { Lazy[id]=X; lazy_update(); return ; }
    left().assign_range(L, R, X); // easier to read
    right().assign_range(L, R, X);
    Max[id]=max(Max[id*2], Max[id*2+1]);
  }
  
  int max_range(int L, int R) {
    if (ll>R || L>rr) return -oo;
    if (L<=ll && rr<=R) return Max[id];
    int Max1 = left().max_range(L, R);
    int Max2 = right().max_range(L, R);
    return max(Max1, Max2);
  }
};

int main(){
  ...
  cin >> L >> R >> X;
  node(1, n, 1).assign_range(L, R, X); // easier to read
  ...
  node Root(1, n, 1); // use this if we have to write to much node(1, n, 1)
  cin >> L >> R;
  cout << Root.max_range(L, R) << endl;
  ...
}

Let's list advantages of new style:

  • Reduce duplicated code
  • Average lines' length and functions' length is reduced.
  • Easier to read

The more complex problem, the more efficient this new style.

3. Additionally feature in new style

Assume that we are facing this problem:

There is a string S which contains lowercase letters. Execute following operators on this string:

  • check i j k: Check if S[i..i+k-1] is equal to S[j..j+k-1].
  • delete k: Delete k-th element.

To solve above problem, we need to build a segment tree which support two operators : hash-range L R return hash value of S[L..R], and remove X remove X-th element. I called the following structure indexable segment tree. In each node, we maintain hash value and size of this node.

#define long long long

struct node {
  int ll, rr, id;
  node(int L, int X) 
    { ll=L, rr=L+Size[X]-1, id=X; }
  node left() 
    { return node(ll, id*2); }
  node right() 
    { return node(ll+Size[id*2], id*2+1); }

  void update() {
    Hash[id]=sum_hash(Hash[id*2], Hash[id*2+1], Size[id*2+1]);
    Size[id]=Size[id*2] + Size[id*2+1];
  }
  
  long hash_range(int L, int R) {
    if (ll>R || L>rr) return 0LL;
    if (L<=ll && rr<=R) return Hash[id];
    long Hash1 = left().hash_range(L, R);
    long Hash2 = right().hash_range(L, R);
    return sum_hash(Hash1, Hash2, Size[id*2+1]); // easy to implement sum_hash()
  }
  
  void remove(int X){
    if (ll>X || X>rr) return ;
    if (ll==rr) { Size[id]=0; Hash[id]=0; return; }
    right().remove(X); // if call left().remove(X) first and succeeded, ...
    left().remove(X); // ... we are not allowed to call right().remove(X)
    update();
  }

  void build(char a[], int L, int R) { // Note: ll, rr are not valid now but id
    if (L==R) 
      { Size[id]=1; Hash[id]=a[L]; return ; }
    left().build(a, L, (L+R)/2);
    right().build(a, (L+R)/2+1, R);
    update();
  }
};

char a[N]; // zero-based;

main(){
  cin >> a; n=strlen(a);
  node(0, 1).build(a, 0, n-1); // initialize segment tree
  ...
  cin >> i >> j >> k;
  long Hash1 = node(0, 1).hash_range(i, i+k-1);
  long Hash2 = node(0, 1).hash_range(j, j+k-1);
  cout << (Hash1 == Hash2 ? "Yes" : "No") << endl;
  ...
  cin >> k;
  node(0, 1).remove(k);
  ...
}

In the above code, managed segment of each nodes in segment tree is consecutively changed. And struct node become very flexible and efficient.

In this example, we are not allowed to use node Root(0, 1) because Root content is always change (rr field). If you want to use Root variable, you should update Root after every remove query: node(0, 1).remove(k); Root=node(0,1); (and I don't like this).

4. Conclusion

It is hard to write both long and detailed blog. Therefore, you can comment anything which you didn't understand well. I will reply (or update this blog if it is necessary).

  • Vote: I like it
  • +119
  • Vote: I do not like it

| Write comment?
»
10 years ago, # |
  Vote: I like it +13 Vote: I do not like it

Your approach is worth a try. As I see, this style will cost us three more integers, won't it? So, we should not use it in problems which have strict memory limit.

It's not relevant to your topic. However, your code seems to be short mostly because you put many statements and all the brackets in one line, which make me feel pretty uncomfortable. It's impossible to manipulate other people's coding style but I think that you should expand your code before posting to make it easier to read.

»
10 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Good blog! Is it possible to implement 2nd problem with normal SegTree?

  • »
    »
    10 years ago, # ^ |
      Vote: I like it +3 Vote: I do not like it

    Yes. There are two typical solution.

    1) Create another operator to find real position of indexes

    For example, if we have 5 elements, XXXXX, after operator delete 3, it become XX-XX. From this time, operator hash-range 1 3 is actually hash-range 1 4. To do this, we can add a function into segment tree or create a new BIT. It is easy to code but expensive.

    2) Still use old style

    long hash_range(int u, int ll, int rr, int L, int R) {
      ...
      hash_range(u*2, ll, ll+Size[u*2]-1, L, R);
      hash_range(u*2+1, ll+Size[u*2], Size[u*2+1], L, R);
    }
    

    Actually, it seems not too bad. However, I don't like to copy-paste them into function remove again.

    • »
      »
      »
      10 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Thanks for your answer! What does the function sum_hash or how can i implement it?

      • »
        »
        »
        »
        10 years ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Assume P and Q is a string.

        • P have length m and hash value is p.
        • Q have length n and hash value is q.

        Assume that we use module 1000000007 and use M as factor. (e.g. p = (P[0]*M^(m-1) + P[1]*M^(m-2) + ... + P[p-1] ) % 1000000007)

        Hash value of P+Q is (p*(M^n)+q) % 1000000007.

»
10 years ago, # |
  Vote: I like it +3 Vote: I do not like it
  • »
    »
    10 years ago, # ^ |
      Vote: I like it +5 Vote: I do not like it

    You used additional 2*n pointers (pl, pr) and 2*n integers (l, r). Used memory is doubled (for precise, 7/3).

    • »
      »
      »
      10 years ago, # ^ |
        Vote: I like it +6 Vote: I do not like it

      Umm... If you like to optimize the constant then you have done a great job.

      But actually unlike in OI competition, in CF or TC the running time constant is not that important.It is more important to code them fast and correctly. I used to code Segment Tree in the way like you when I was young, but then I realized that it scarfice the 'user-experience' for speedness.

      And by pointer you can also easier generalized it to support functional operation too.

»
10 years ago, # |
Rev. 2   Vote: I like it +16 Vote: I do not like it

One way to handle the int u, int ll, int rr that appear in function calls is using default arguments: int query(int L, int R, int u = 1, int ll = 0, int rr = n - 1);. A call to the function looks like: query(L, R);

»
10 years ago, # |
Rev. 3   Vote: I like it 0 Vote: I do not like it

Hi, I really enjoy that coding style, but the root should be "node root(1,n,1);", because the left and right elements are 1 and n and the id is 1. what do you think?