Блог пользователя BumbleBee

Автор BumbleBee, 5 лет назад, По-английски

Given a string $$$S$$$ , I want to find the number of different substrings of each length in the range $$$[1,|S|]$$$.

I can do it using LCP array which can be constructed using the Suffix Array of the given string. But I am trying to perform this task using Suffix Automaton.

Is it possible to do this using Suffix Automaton? If yes, then how?

  • Проголосовать: нравится
  • +5
  • Проголосовать: не нравится

»
5 лет назад, # |
Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

Here's my go at this. If anyone finds there to be something wrong, please reply to this comment. The terminology I'm using is w.r.t the SA article on cp-algorithms.com. A state in SA corresponds to a no. of substrings (prefixes of some suffix). It contains all substrings in a contiguous range of length $$$(len(link(v)), len(v)]$$$. So, you can perform a dfs on the SA and for every vertex $$$v$$$, you can add $$$1$$$ to the count of all lengths of substrings from $$$[len(link(v))+1, len(v)]$$$.

  • »
    »
    5 лет назад, # ^ |
    Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

    At first, I tried this way too. But it turns out that it won't work in all cases. Suppose, you have reached a state that has a child which is already visited. If you skip this child state you will miss some paths/substrings. If you don't skip visited nodes, the complexity won't be $$$O(N)$$$ anymore. In my case, the length of the string can be $$$10^6$$$ and I am trying to perform this task in $$$O(N)$$$.

    UPD: Your idea is correct. I thought I will miss some substrings by skipping visited states. But that won't happen if I count for all lengths in the range $$$[len(link(v))+1,len(v)]$$$ for all states $$$v$$$.

    • »
      »
      »
      5 лет назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      Just to ensure that the problem is not from an ongoing contest. Can you please post the problem link?

    • »
      »
      »
      5 лет назад, # ^ |
        Проголосовать: нравится +2 Проголосовать: не нравится

      Here is my accepted solution using the idea I described above. There must be a bug in your implementation. I spent an hour finding out in mine. I suggest you to thoroughly read the article that I linked in the answer above. A node may correspond to more than one substring, but in that case, a smaller substring will always be a suffix of the larger one. So, in performing the step of adding $$$1$$$ for all substrings having length in the range $$$[len(link(v)) + 1, len(v)]$$$, we are actually considering all substrings that the current state corresponds to and no substring ending at the current state is missed.