MarcAS's blog

By MarcAS, 11 months ago, In English

Hello, this a brief post about some interesting suffix array properties that are needed in some problems such as Oolimry and Suffix Array and the recently added Inverse Suffix Array problem on CSES.

Suffix array properties

Reminders:

  • Given two strings $$$s$$$ and $$$t$$$, $$$s$$$ is lexicographically smaller than $$$t$$$ if and only if $$$s$$$ is a proper prefix of $$$t$$$ or there is some $$$i \lt \min(|s|,|t|)$$$ where $$$s[i] \lt t[i]$$$ and $$$s[j]=t[j]$$$ $$$\forall j$$$ such that $$$0\leq j \lt i$$$. In this case we will say $$$s \lt t$$$.
  • Given a string $$$s$$$, the suffix array for $$$s$$$ is a permutation $$$p$$$ where $$$p[i]$$$ is the start of the $$$i$$$th smallest suffix of $$$s$$$ in lexicographical order.
  • If $$$inv$$$ is the inverse permutation of the suffix array $$$p$$$ of $$$s$$$, then $$$inv[i] \lt inv[j]$$$ implies that $$$s[i...|s|-1] \lt s[j...|s|-1]$$$.

Basic property

If $$$s \lt t$$$ then $$$s[0]\leq t[0]$$$, otherwise we'd get that $$$s[0] \gt t[0]\implies s \gt t$$$.

By the definition of $$$p$$$, we know that $$$\forall i,j$$$ where $$$0\leq j \lt i \lt |s|:s[p[j]...n-1] \lt s[p[i]...n-1]\implies s[p[j]]\leq s[p[i]]$$$ meaning the characters at the indexes of $$$p$$$ are in non-decreasing order.

Formally, if $$$0 \lt i \lt |s|$$$ then $$$s[p[i]]\geq s[p[i-1]]$$$.

Small detour

If $$$|s|,|t|\geq 2$$$, $$$s \lt t$$$ and $$$t[1...|t|-1] \lt s[1...|s|-1]$$$ then $$$s[0] \lt t[0]$$$.

Proof: otherwise we'd get that $$$s[0]\geq t[0]$$$ but $$$s[0]\leq t[0]$$$ by basic property therefore $$$s[0]=t[0]$$$ but $$$t[1...|t|-1] \lt s[1...|s|-1]$$$ which implies that $$$t \lt s$$$ which is a contradiction.

Interesting property 1

If we have two indices $$$i,j$$$ such that $$$0\leq j \lt i \lt |s|$$$ and $$$p[i],p[j] \lt |s|-1$$$, using the third point from the reminders and the above property, we get that if $$$inv[p[j]+1] \gt inv[p[i]+1]$$$ then $$$s[p[i]] \gt s[p[j]]$$$.

Formally, if $$$0 \lt i \lt |s|$$$ and $$$p[i] \lt |s|-1$$$ then $$$s[p[i]] \gt \max\limits_{0\leq j \lt i,\text{ }p[j] \lt |s|-1,\text{ }inv[p[j]+1] \gt inv[p[i]+1]}s[p[j]]$$$.

Interesting property 2

If $$$|s|=n$$$ and $$$i \gt 0$$$ such that $$$p[i]=n-1$$$ ($$$p[i-1] \lt n-1$$$ because $$$p$$$ is a permutation of $$$0,1,...,n-1$$$), using the basic property and the definition, we get that $$$s[n-1] \gt s[p[i-1]]$$$.

Proof: otherwise we'd get that $$$s[n-1]\leq s[p[i-1]]$$$ but $$$s[n-1]\geq s[p[i-1]]$$$ by basic property therefore $$$s[n-1]=s[p[i-1]]$$$ meaning $$$s[p[i-1]...n-1]=s[n-1]+s[p[i-1]+1...n-1]\implies s[n-1]$$$ is a proper prefix of $$$s[p[i-1]...n-1]$$$ so $$$s[p[i]...n-1] \lt s[p[i-1]...n-1]$$$ which is a contradiction.

Formally, if $$$0 \lt i \lt n$$$ and $$$p[i]=n-1$$$ then $$$s[p[i]] \gt s[p[i-1]]$$$

Inverse Suffix Array

We are given the suffix array $$$p$$$ of a string $$$s$$$ which we have to reconstruct.

We will set $$$s[p[i]]$$$ to c for every $$$i=0,1,...,n-1$$$ with c being non-decreasing as $$$i$$$ increases.

c will be initialized as 'a' as it gives the most chances to increment it, and if it has to exceed 'z', then there is no valid string. Note that it's useless here to increment c by more than one letter difference at once.

When to increment c

  • $$$i \gt 0$$$ and $$$p[i]=n-1$$$.
  • $$$p[i] \lt n-1$$$ and there is some $$$j \lt i$$$ such that $$$p[j] \lt n-1$$$, $$$inv[p[j]+1] \gt inv[p[i]+1]$$$ and $$$s[p[j]]$$$ has current value of c.

The second part can be done using a $$$\max$$$ char Segment tree or Fenwick tree. We initialize the tree with any character that's < 'a', and then every time after setting $$$s[p[i]]$$$ to c with $$$p[i] \lt n-1$$$, we also set $$$tree[inv[p[i]+1]]$$$ to c. So that way, at any other $$$i' \gt i$$$ such that $$$p[i'] \lt n-1$$$, we can check if $$$\max(tree[inv[p[i']+1]+1...n - 1])$$$ is c meaning the second case is met so we have to increment it.

Full text and comments »

  • Vote: I like it
  • +35
  • Vote: I do not like it

By MarcAS, 11 months ago, In English

Hello, this is a brief post about extending Manacher's algorithm to get the longest palindromic substring ending at every position while maintaining its $$$\mathcal{O}(n)$$$ complexity, in response to an old blog post asking for the same thing with no clear answers, and the recently added All Palindromes problem on CSES.

Say we already computed the d Manacher array in $$$\mathcal{O}(n)$$$ using the following algorithm:

vector<int> manacher(string& t) {
    string s{'$'};
    for(const char& c: t) s += string{'#', c};
    s += string{'#', '^'};

    vector<int> d((int)s.size());
    int l = 1, r = 1;
    for(int i = 1; i < (int)s.size() - 1; ++i){
        d[i] = max(0, min(r - i, d[l + r - i]));
        while(s[i - d[i]] == s[i + d[i]]) ++d[i];
        if(i + d[i] > r){
            l = i - d[i];
            r = i + d[i];
        }
    }
    return d;
}

We now want to compute an array res where res[i](initially $$$0$$$) is the length of longest palindromic substring ending at index $$$i$$$.

First considering only oddly-sized substrings, if we know that the longest palindromic substring centered at $$$i$$$ is $$$t[i-r:i+r]$$$, then we can simply apply res[j] = max(res[j], 2 * (j - i) + 1)) for every $$$j=i,...,i+r$$$ as the substring centered at $$$i$$$ ending at $$$j$$$ is also palindromic but this takes $$$\mathcal{O}(n^2)$$$ time as we must do this for every $$$i=0,1,...,n-1$$$.

An important observation is that res[j] will only change once at some $$$i$$$, as at any $$$i' \gt i:2(j-i')+1 \lt 2(j-i)+1$$$. Since we are updating res[j] in increasing order of j, we can simply keep a pointer to the first index of res that has not been updated yet, from which we start updating. This results in an $$$\mathcal{O}(n)$$$ complexity algorithm.

The same logic can be applied to evenly-sized substrings, so we loop again in order to count their contribution.

The final algorithm is the following:

vector<int> longest_palindrome_ending(vector<int>& d) {
    int n = ((int)d.size() - 3) >> 1, j = 0;
    vector<int> res(n);
    for(int i = 0; i < n; ++i)
        for(j = max(j, i); j < i + (d[(i + 1) << 1] >> 1); ++j)
            res[j] = 1 + ((j - i) << 1); 
    j = 0;
    for(int i = 1; i < n; ++i)
        for(j = max(j, i); j < i + (d[1 + (i << 1)] >> 1); ++j)
            res[j] = max(res[j], (j - i + 1) << 1);
    return res;
}

Full text and comments »

  • Vote: I like it
  • +35
  • Vote: I do not like it