SPOJ NSUBSTR Editorial

Revision en2, by SydneySweeneyFan, 2025-06-19 12:16:52

In the NSUBSTR problem on SPOJ, we are given a string $$$s$$$ of length $$$n$$$, and for every $$$k = 1$$$ to $$$n$$$, we are asked to find the maximum number of times any substring of length $$$k$$$ appears in $$$s$$$ (including overlapping occurrences).


1. Rephrasing the Problem

We need to compute, for each possible substring length $$$k$$$, the highest frequency among all substrings of length $$$k$$$. For example, if $$$s = "ababab"$$$:

  • $$$k=1$$$: substrings are {"a", "b"}, frequencies {3,3} → answer 3.
  • $$$k=2$$$: substrings are {"ab", "ba"}, frequencies {3,2} → answer 3.
  • etc.

A brute-force sliding-window plus hashing approach would be $$$O(n^2)$$$ or worse, which is too slow for $$$n$$$ up to $$$10^5$$$.


2. Key Insight: Suffix Array + LCP

We transform the problem into counting identical substrings by sorting all suffixes and using their common prefixes.

  1. Suffix Array: an array $$$sa$$$ of length $n$ that lists the starting positions of all suffixes of $s$ in lexicographical order.
  2. LCP Array: an array $$$lcp$$$ of length $n-1$ where $$$lcp[i]$$$ = length of the longest common prefix of the suffixes starting at $$$sa[i]$$$ and $$$sa[i+1]$$$.

Once we have $$$sa$$$ and $$$lcp$$$, any group of $$$c$$$ consecutive suffixes in sorted order share a common prefix of length at least $$$L = \min(lcp[i..i+c-2])$$$. That prefix corresponds to some substring of length $L$ occurring $$$c$$$ times.


3. From LCP to Maximum Frequencies

Let’s imagine the $$$lcp$$$ array as heights of bars in a histogram of size $$$n-1$$$. For each index $$$i$$$:

  • The bar at $$$i$$$ has height $$$h = lcp[i]$$$.
  • We want to know the largest span $$$[L[i], R[i]]$$$ around $$$i$$$ where all bars are at least height $h$.
  • That span corresponds to $$$c = (R[i] - L[i] + 2)$$$ suffixes sharing a common prefix of length $$$h$$$.

So for each $$$i$$$, we can update:

$$$ ans[h] = \max(ans[h], c) \quad\text{where } c = R[i]-L[i]+2. $$$

After processing all $$$i$$$, we have for each exact length $$$h$$$ the maximum count of some substring of length $$$h$$$. However, note that if a substring of length $$$h$$$ appears $$$c$$$ times, then all its prefixes (length $$$ \lt h$$$) appear at least $$$c$$$ times. Thus we perform a final pass:

$$$ \text{for } x = n-1 \downarrow 1:\quad ans[x] = \max(ans[x], ans[x+1]). $$$

This ensures the answer is non-decreasing as substring length shrinks.


4. Efficient Computation of $$$L[i], R[i]$$$

To find, for each bar $$$i$$$, the next smaller element on the left and right, we use a classic monotonic stack in two passes:

  1. Left pass: iterate $$$i=0$$$ to $$$n-2$$$, maintain a stack of indices with strictly increasing $$$lcp$$$ heights. When $$$lcp[stack.top()] \ge lcp[i]$$$, pop. Then:
  • If stack empty, set $$$L[i]=0$$$; else $$$L[i]=stack.top()+1$$$.
  • Push $$$i$$$.
  1. Right pass: similarly from $$$i=n-2$$$ down to 0, find next smaller on the right and set:
  • If stack empty, $$$R[i]=n-2$$$; else $$$R[i]=stack.top()-1$$$.

Each index is pushed/popped at most once ⇒ $$$O(n)$$$ total.


5. Putting It All Together

Optimal C++ Solution

6. Suffix Array & LCP: A Quick Primer

  • Suffix Array in $$$O(n\log n)$$$: sort cyclic shifts by doubling technique (or SA-IS in $$$O(n)$$$).
  • LCP Array in $$$O(n)$$$ with Kasai’s algorithm: scans adjacent suffixes, reuses previous LCP to avoid recomputation.

If you’re unfamiliar, many tutorials walk through building suffix arrays by sorting starting characters, then doubling the length each round, reassigning classes.


7. Complexity

  • Building SA: $$$O(n\log n)$$$
  • Building LCP: $$$O(n)$$$
  • Two stack passes: $$$O(n)$$$
  • Final propagation & output: $$$O(n)$$$

Overall: $$$O(n\log n)$$$, suitable for $$$n$$$ up to $$$10^5$$$.


8. Conclusion

By reducing the substring-frequency problem to histogram spans over the LCP array, we elegantly count the maximum occurrences of each length in $$$O(n\log n)$$$. This approach highlights the power of suffix arrays and range queries with stacks for common-prefix problems.

Tags suffix array, lcp array, string

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en2 English SydneySweeneyFan 2025-06-19 12:16:52 230
en1 English SydneySweeneyFan 2025-06-19 12:15:19 11605 Initial revision (published)