Suffix Array / Manber and Myers Algo

№	Пользователь	Рейтинг
1	tourist	3985
2	jiangly	3741
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3489
7	Radewoosh	3483
8	Kevin114514	3443
9	ecnerwala	3392
9	Um_nik	3392

№	Пользователь	Вклад
1	cry	167
2	Um_nik	163
3	maomao90	162
3	atcoder_official	162
5	adamant	158
5	-is-this-fft-	158
7	awoo	156
8	djm03178	155
9	TheScrasse	154
10	Dominater069	153

(Actually this is a question) So I thought I knew the intuition behind the Manber and Myers algorithm. Here is what I understood.

Suppose the string is "banana"

We first partition the suffixes in terms of similar first character as

a, anana, ana => bucket 1

banana => bucket 2

na, nana => bucket 3

Then to get the partition by the next 2h characters, my algo is:

scan each bucket one by one
take the first bucket
for each suffix in this bucket, find the position of sa + 2h, if we go out of bounds assign position = 0

So picture looks like this:

a = 0, anana = 3, ana = 3 (since a + 1 > n, nana is in 3rd bucket and na is also in third bucket)

Now, sort the assigned indices of the bucket using counting sort.
Scan the new indices one by one and create new partitions, here we get

[a], [anana, ana]

Do this until buckets = n

My problem is in 4th part, where I use counting sort.

First I coded as I had thought that I had understood the algorithm. But then I ran into trouble. As the number of buckets goes on increasing during each iteration, my algorithm approaches O(n^2) (as I assign ranks during counting sort according to the location of s + 2h suffix). So with some modification to the algorithm can I get O(nlogn)? If not what should I do?

Ok. I removed the code. So please answer me now.

Комментарии (5)

Написать комментарий?

bhikkhu

10 лет назад, # |

OKAY. why downvote ? If it's due to the format then that's not because of me. I am not joking here.

→ Ответить

10 лет назад, # ^ |

Why does the post appear so dirty?

+18

If you downvote, please give the reason too.

misof

When you have the current bucket for each suffix, you can compute new ones as follows:

For each i, consider the ordered pair ( bucket[i], bucket[i + (1<<k)] ). (here, bucket[index beyond the end] is a value larger than any valid bucket[i] )

Sort the suffixes with those pairs used as keys. This cannot be done by an ordinary countsort (there are about n^2 possible pairs (x,y)), but it can be done by a two-pass radix sort in O(n), or if you are lazy, by a standard sort in O(n log n). (The second approach then gives you O(n log^2 n) overall time complexity.)

After the sort, relabel the buckets in O(n) and you are ready to start a new iteration.

I was sorting each bucket one after another then appending the buckets together. I had not thought of assigning pairwise ranks. very stupid of me. +1 and Thank you very much sir for your time.

Блог пользователя bhikkhu