Shade100's blog

By Shade100, history, 8 years ago, In English

Hi! I'm having trouble understanding this O(NlogN) implementation of suffix array construction :

Code

In particular:

for (i = 0; i < n; i++) { // count the frequency of each integer rank
	c[i + k < n ? RA[i + k] : 0]++;
}

0 is used as the rank for an empty string when it is also the rank of the smallest suffix, as seen below.

tempRA[SA[0]] = r = 0; // re-ranking; start from rank r = 0

// compare adjacent suffixes
for (i = 1; i < n; i++) {
	// if same pair => same rank r; otherwise,increase r
	tempRA[SA[i]] = (RA[SA[i]] == RA[SA[i - 1]] && RA[SA[i] + k] == RA[SA[i - 1] + k]) ? r : ++r;
}

SA[i] + k can go out of bounds.

These seem like bugs, but the algorithm works, but strangely only when a sentinel character such as $ is appended to the end. Why?

Thanks in advance!

  • Vote: I like it
  • 0
  • Vote: I do not like it

| Write comment?
»
8 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Because practically when you append a special character (which its ASCII value is lower than any letter in the alphabet) at the end of your string, you're implying that this unused suffix will always have the rank 0, therefore your smallest suffix is always in the index 1 rather than 0.