megalodon's blog

By megalodon, 12 years ago, In English

Hello. Can anybody please provide me a clean tutorial of how to implement a Suffix Array in Java? Thanks

  • Vote: I like it
  • -9
  • Vote: I do not like it

»
12 years ago, # |
  Vote: I like it 0 Vote: I do not like it

May be this helps.. It has a clean implementation along with lots of practice problems

  • »
    »
    12 years ago, # ^ |
      Vote: I like it +2 Vote: I do not like it

    thanks, but the suffix array presented is too slow. You need something at least O(n logn) for a contest.

»
12 years ago, # |
  Vote: I like it +20 Vote: I do not like it

https://sites.google.com/site/indy256/algo/suffix_array — just an implementation.

Also, I don't think that it's time to learn suffix array with your green colour...

  • »
    »
    12 years ago, # ^ |
      Vote: I like it +3 Vote: I do not like it

    that it's time to learn suffix array with your green colour...

    Maybe, it's true. But...

    For example, not all my students in university (they had AC on "Suffix Array task" as part of practice) where good in olympiads. Even clever students. Some of them even never participated :-)

    Or university is not good place/time to learn Suffix Array? Or not "true olympiad people" should not know about it? :)

    • »
      »
      »
      12 years ago, # ^ |
        Vote: I like it +8 Vote: I do not like it

      I'm sure that author wants to learn it for olympiads. But if so, there are many easier and more helpful algorithms.

      Your university is good place to learn suffix array. I guess most of my groupmates will be in the army now if there's a similar course in my university. And I have a feeling that most of the universities are closer to my than to yours :D

      • »
        »
        »
        »
        10 years ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Re-read your comments. Sounds arrogant, isn't it?

        • »
          »
          »
          »
          »
          10 years ago, # ^ |
            Vote: I like it 0 Vote: I do not like it

          It doesn't matter how it sounds, because the truth looks exactly as dalex said 1.5 years ago.

          Just go to a random university and try to explain suffix array to a random CS student. You'll be heavily surprised that this student very likely won't understand you. After this you will possibly start to appreciate the fact that you study in MSU and are surrounded with lots of smart people.

          And yes, I really know what I say.

  • »
    »
    12 years ago, # ^ |
    Rev. 2   Vote: I like it +5 Vote: I do not like it

    It's always time to learn. Thanks for the implementation anyway it's good enough.

  • »
    »
    11 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Incorrect attitude bro, studying classical algorithms also teaches me quite productively how to think.

»
12 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Well,, I think this two (1 and 2) links provide a good number of resources on suffix array. You can also take help from Stack Overflow too.

Hope these help you. And sorry for my poor English. Thanks :)

»
10 years ago, # |
Rev. 3   Vote: I like it 0 Vote: I do not like it

Naive implementation is many times just fine for a contest.

»
10 years ago, # |
  Vote: I like it 0 Vote: I do not like it

I think that the question asked on a tutorial of how to implement it and not of an implementation already coded. So here's my brief explanation on how you can implement it by yourself (you always learn more by implementing an algorithm your self than by just using it).

You'll group all the suffixes in buckets, and you'll do it by comparing their prefixes of sizes 2^i for all i until 2^i >= length of the string. Let's say you have the string "ABRACADABRA", then you'll have all this suffixes: "ABRACADABRA", "BRACADABRA", "RACADABRA", "ACADABRA", "CADABRA", "ADABRA", "DABRA", "ABRA", "BRA", "RA" and "A".

First of all you'll group them in buckets for the first letter, so the buckets will be:

["ABRACADABRA", "ACADABRA", "ADABRA", "ABRA", "A"]

["BRACADABRA", "BRA"]

["CADABRA"]

["DABRA"]

["RACADABRA", "RA"]

Note that in each bucket the strings aren't sorted but two strings are properly sorted between each other if they belong to different buckets. This comparison is O(n log n) as you only sort comparing by one character per suffix (The first character of the suffix). So now you have the strings sorted by the first character, and you want to sort them by the first two characters. What you'll do now is sorting the suffixes that belong to the same bucket between each other, but not comparing with between suffixes of different buckets, as you already know how they compare each other.

Let's say we want to sort the first bucket ["ABRACADABRA", "ACADABRA", "ADABRA", "ABRA", "A"]. For sorting it we know that the first letter is the same in all the suffixes, so we compare the second letter, but how we compare it? We don't compare characters but suffixes, for example, if we want to compare "ABRACADABRA" with "ACADABRA", we know how "BRACADABRA" and "CADABRA" compare each other by the first letter of the suffix, so we look for this comparison. Now we have the following buckets for all this strings:

["A"] (note that because the "second letter" of this string is the end of string we consider it to come before any other letter).

["ABRACADABRA", "ABRA"]

["ACADBRA"]

["ADABRA"]

This is how we do the second step. Now look that the second bucket (["BRACADABRA","BRA"]) remained the same, because "BR" == "BR", so let's see how we sort it in the third step. Now we want to compare the suffixes by their prefixes of length 4 (or less if the suffix has less than 4 characters), so "BRA" < "BRAC" but we need to do two comparisons, as we know they share the first two characters, but we have to compare "A" == "A" and End of String < "C", but let's see how we do it in one step. We know how "A" and "ACADABRA" compare to each other according to the prefix of length at most 2, that is "A" < "ACADABRA", so we just look for this comparison once, and we know then that "BRA" < "BRACADABRA".

Now let's use a bigger example to make it more clear. Let's say we have the strings "ABCDEFGHIABCDEFGIJK" and "ABCDEFGIJK" in the same bucket and we've already compared the first 4 letters of the suffixes. Now we want to compare "ABCDEFGI" with "ABCDEFGH" (the first 8 letters of the suffixes). We know that "EFGHIABCDEFGIJK" < "EFGIJK" according to the first 4 letters of the suffixes, so we just make look for this comparison and we compare 4 letters in only one operation.

I hope it's clear enough for you to being able to implement it.

»
10 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Thanks Man !!