terminated's blog

By terminated, 10 years ago, In English

I recently was doing this problem on spoj.It was about finding the number of distinct sub-string in a string.Suffix array was to be used but i didn't knew about suffix arrays.I searched on net and learned about suffix array but when I moved towards implemention all codes available online were a bit lengthy so I implemented my own and came out with this--

#include "bits/stdc++.h"
using namespace std;
 int main(int argc, char const *argsorted_suffixes[])
{
	
		set<string> set1;

		string s="aabaab";

		int n=s.length();
		/*inserting all possible suffixes of string in set1*/
		for (int i = 0; i < n ; ++i) 
		{
			set1.insert(s.substr(i,n-i));
		}

		vector<string> sorted_suffixes(set1.begin(),set1.end()); /*suffix vector(array)*/
		/*Building LCP table*/
		int LCP[set1.size()];

		LCP[0]=0;

		string s4,s5;

		for(int i=1;i<sorted_suffixes.size();i++)
		{
			s4=sorted_suffixes[i-1];
			s5=sorted_suffixes[i];
			int c=0;
			for(int j=0;j<s4.length();j++)
			{
				if(s4[j]==s5[j])
					c++;
				else
					break;
			}
			LCP[i]=c;


		}
		/*LCP TABLE CONSTRUCTED*/
		for (int i = 0; i < sorted_suffixes.size(); ++i)
		{
			cout<<sorted_suffixes[i]<<" "<<LCP[i]<<endl;
		}
}

Here I have not constructed a suffix array rather I have a vector of strings which has the suffixes in sorted order and I think it doesn't make a difference to either have a container of indexes or to have a container of sorted suffixes. This is easy to code and understand.I have tested it for many cases and it works fine, If there is any bug in my code please point out!

  • Vote: I like it
  • -21
  • Vote: I do not like it

»
10 years ago, # |
  Vote: I like it +1 Vote: I do not like it

Each insertion to a set takes O(logN) comparisons. Here each comparison is between 2 strings and hence will take O(length) time. So, the complexity of building the set is O(n * (length) * logn) which is O(n^2logn). This will time out in most of the problems.

  • »
    »
    10 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    kk thnx for pointing the bug

    • »
      »
      »
      10 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      sandyepp anything u can suggest will be appreciated

      • »
        »
        »
        »
        10 years ago, # ^ |
          Vote: I like it +1 Vote: I do not like it

        As far as I know, you rarely actually need to understand the construction of the suffix array to solve problems. You just need to have pre-written code for it and know how to use the lcp function properly. Try problems under string or suffix structures category or problems marked with suffix arrays in codechef and you'll know more.

»
10 years ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

set1.insert(s.substr(i,n-i));

Is this correct??? i guess it must be : set1.insert(s.substr(i,n));

  • »
    »
    10 years ago, # ^ |
      Vote: I like it +1 Vote: I do not like it

    Second argument is length.

  • »
    »
    10 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    First param is begin, second is length, not end. But your variant will also work, because it will cut on the string's end.

»
10 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Well, here are some not so lengthy implementation. Kasai() computes lcp array.

: #67rZRE

: #4AsFVi

  • »
    »
    5 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    what does kasai function do? I know it is about finding the LCP but I am not understanding in what form.

  • »
    »
    5 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I believe there're some slight issue with your implementation when I'm testing it. Could you provide with a newer edition? Thanks a lot!

  • »
    »
    5 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    The problem is that the suffix array of "aaaaa" calculated by your code is 4, 5, 1, 2,3 instead of 5,4,3,2,1

    • »
      »
      »
      5 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Perhaps it computes cyclic suffix array... Try adding # to the end of the string or something.