I recently was doing this problem on spoj.It was about finding the number of distinct sub-string in a string.Suffix array was to be used but i didn't knew about suffix arrays.I searched on net and learned about suffix array but when I moved towards implemention all codes available online were a bit lengthy so I implemented my own and came out with this--
#include "bits/stdc++.h"
using namespace std;
int main(int argc, char const *argsorted_suffixes[])
{
set<string> set1;
string s="aabaab";
int n=s.length();
/*inserting all possible suffixes of string in set1*/
for (int i = 0; i < n ; ++i)
{
set1.insert(s.substr(i,n-i));
}
vector<string> sorted_suffixes(set1.begin(),set1.end()); /*suffix vector(array)*/
/*Building LCP table*/
int LCP[set1.size()];
LCP[0]=0;
string s4,s5;
for(int i=1;i<sorted_suffixes.size();i++)
{
s4=sorted_suffixes[i-1];
s5=sorted_suffixes[i];
int c=0;
for(int j=0;j<s4.length();j++)
{
if(s4[j]==s5[j])
c++;
else
break;
}
LCP[i]=c;
}
/*LCP TABLE CONSTRUCTED*/
for (int i = 0; i < sorted_suffixes.size(); ++i)
{
cout<<sorted_suffixes[i]<<" "<<LCP[i]<<endl;
}
}
Here I have not constructed a suffix array rather I have a vector of strings which has the suffixes in sorted order and I think it doesn't make a difference to either have a container of indexes or to have a container of sorted suffixes. This is easy to code and understand.I have tested it for many cases and it works fine, If there is any bug in my code please point out!
Each insertion to a set takes O(logN) comparisons. Here each comparison is between 2 strings and hence will take O(length) time. So, the complexity of building the set is O(n * (length) * logn) which is O(n^2logn). This will time out in most of the problems.
kk thnx for pointing the bug
sandyepp anything u can suggest will be appreciated
As far as I know, you rarely actually need to understand the construction of the suffix array to solve problems. You just need to have pre-written code for it and know how to use the lcp function properly. Try problems under string or suffix structures category or problems marked with suffix arrays in codechef and you'll know more.
set1.insert(s.substr(i,n-i));
Is this correct??? i guess it must be : set1.insert(s.substr(i,n));
Second argument is length.
First param is begin, second is length, not end. But your variant will also work, because it will cut on the string's end.
Well, here are some not so lengthy implementation. Kasai() computes lcp array.
: #67rZRE
: #4AsFVi
what does kasai function do? I know it is about finding the LCP but I am not understanding in what form.
I believe there're some slight issue with your implementation when I'm testing it. Could you provide with a newer edition? Thanks a lot!
The problem is that the suffix array of "aaaaa" calculated by your code is 4, 5, 1, 2,3 instead of 5,4,3,2,1
Perhaps it computes cyclic suffix array... Try adding # to the end of the string or something.
Oh didn’t realize this. Thanks