Блог пользователя k790alex

Автор k790alex, 12 лет назад, По-английски

Problem:

Given a string S find the longest repeated substring non overlaps.

1 <= |S| <= 50,000

Input:
ABBBB

output:
BB

I'm trying to solve a problem like this, after thinking some time I came up with this solution using suffix array:

pos[i] -> sorted suffix array
lcp[i] -> longest common prefix between i-th suffix and (i-1)-th suffix
lcp2[i] -> like lcp but without overlaps -> lcp[i] = min(l p[i], abs( pos[i] - pos[i - 1] ) )

ans = max( lcp2[i] ) for 1 <= i < |S|

my approach was working for some cases but it fails for the test I wrote before.

I could think in a brute force solution which I think its some obvious:

For all repeated substring SUB, try all substring of SUB and test if have overlaps.

Do you have a better way?

Thanks.

  • Проголосовать: нравится
  • +3
  • Проголосовать: не нравится

»
12 лет назад, скрыть # |
 
Проголосовать: нравится +1 Проголосовать: не нравится

easy, if you know suffix automaton. just calculate first and last position in string for all states.

  • »
    »
    12 лет назад, скрыть # ^ |
     
    Проголосовать: нравится 0 Проголосовать: не нравится

    thanks, I gonna read about suffix automaton.

    If the problem is to find the longest repeated substring (non-overlapping) repeted (al least / exactly) k times, suffix automaton works?

    • »
      »
      »
      12 лет назад, скрыть # ^ |
      Rev. 8  
      Проголосовать: нравится +1 Проголосовать: не нравится

      yes, the same

      upd i forgot about non-overlaping. with this condition i know solution only with O(Nlog2N) time instead of O(N)

      one more upd it is O(KNlog2N). but here was a discussion about similar problem with k=3, maybe helps.

      • »
        »
        »
        »
        12 лет назад, скрыть # ^ |
         
        Проголосовать: нравится 0 Проголосовать: не нравится

        can you explain solution for k non-overlaping substrings?

        • »
          »
          »
          »
          »
          12 лет назад, скрыть # ^ |
          Rev. 5  
          Проголосовать: нравится 0 Проголосовать: не нравится

          big upd lets try again with HandIeNeeded's solution. after fixing LEN (in bs) and separating array to segments, for every segment make two sets: set of indexes and set of difference of consecutive pairs, for example for segment with indexes {1, 3, 4} it is {1, 2}. so, while in second set minimum<LEN we need to pop it out and modificate both sets. after that, check for size if at least K. it looks O(Nlog2N).

          + even easier: need only sorted array of every segment, and greedy will works: first index i, lower_bound(i+LEN) and so on K times.

          • »
            »
            »
            »
            »
            »
            12 лет назад, скрыть # ^ |
            Rev. 2  
            Проголосовать: нравится -8 Проголосовать: не нравится

            It looks like greedy solution. We don't know which number we need to pop out from the first set (it can be 3 or 4). That's why I am not sure, what it will work. But probably after building the suffix array we can easily solve problem for each segment, using fenwick tree.

        • »
          »
          »
          »
          »
          12 лет назад, скрыть # ^ |
          Rev. 2  
          Проголосовать: нравится 0 Проголосовать: не нравится

          i explained, please check for mistakes:)

»
12 лет назад, скрыть # |
 
Проголосовать: нравится 0 Проголосовать: не нравится

hash maybe help

»
12 лет назад, скрыть # |
Rev. 2  
Проголосовать: нравится 0 Проголосовать: не нравится

Use binary search. When we check a length K, could it be the ans? We separate the array lcp[] into many groups, each contains continous elements of array lcp[] and all lcp[i]>= K, and we should check the max_index of suffix and min_index of suffix in each group to see if their difference is bigger than K.And it's done.:D

»
11 лет назад, скрыть # |
 
Проголосовать: нравится +8 Проголосовать: не нравится

Here is my solution to this problem using binary search or string hashing.

Observation : 1. If there exists such string having length x then there must be atleast one such string for all lengths < x.

  1. use this idea to binary search on the length. Fixed a length (say z) find hashes corresponding to each string of length z. There will be exactly n — z + 1 such hashes where n is the length of the given string.

  2. Keep one extra information with each hash that is starting index of this string in the given string.

  3. sort this vector of pairs now and iterate over it to find the maximum distance between the starting index of two string having equal hash value. This will help you in testing whether these two strings are overlapping or not. One you find the answer depending upon your answer you may proceed to any half of the binary search.

complexity : O ( N * log(N) * log(N) )