adamant's blog

By adamant, 11 years ago, translation, In English

Hi everyone! 2 years ago Logvinov_Leon asked about way to get the suffix array from suffix automata. Oddly enough, he had not received answer. However, as you may have noticed, I'm interested in the topic of transformations of some string structures to others, so I decided to shed some light on how this is done.

So, you will notice that suffix automaton — essentially a suffix tree with a few other compression principle. While in compressed tree we just squeeze the edges, when we construct the suffix automaton we getting something like two-way trie. We are compressing not only common prefixes, but also common suffixes. From this we can draw the following interesting conclusion: if we will traverse through automaton with dfs, ignoring the used-array, we will get the same thing as when traversing suffix tree. Now remember that the suffix array from suffix tree obtained by the usual DFS — we just write down the suffixes in the order in which we encounter them in the trie. Thus, we already have an algorithm for constructing suffix array from automata for O(n2).

But, of course, it does not suit us and we want to go further. If we want to have linear time, we can't ignore the used-array. Easy to see that even so, starting in depth tour we will meet a few (at least one) suffixes before the first stop in the visited state. Now our task when we visit the state is to quickly get all the suffixes that are hidden behind it, without traversing all the ways from this state again. I suggest the following solution: let each state in the machine to keep indices of the first and last encountered after it suffixes in suffix array. Then, when we came in the used state, we will add to the array all the suffixes in which we would have got from it, taking into account the already traversed length. Obviously, this solution will work for the O(n), since we totally add exactly n suffixes, and all the rest of the time we just dfs through the suffix automaton.

P.S. my implementation of the algorithm.
P.P.S. I can't say for sure, but it seems that in practice, this algorithm is not applicable due to the very large constants.

  • Vote: I like it
  • +68
  • Vote: I do not like it

»
11 years ago, # |
Rev. 3   Vote: I like it +17 Vote: I do not like it

2 years ago Logvinov_Leon asked about way to get the suffix array from suffix automata. Oddly enough, he had not received answer.