Problem Link
Submission Link
Part 1: Calculate the Prefix function (next array) in $$$O(n)$$$
We can calculate the prefix function (next array) in linear time using the following code:
//next is a 0-indexed array that is initialized to zero.
//s is a 0-indexed string
for(int i = 1, j = 0; i < n; ++i){
while(j && s[i] != s[j]) j = next[j-1];
if(s[i] == s[j]) j++;
next[i] = j;
}
While is it $$$O(n)$$$? The key idea is the amortized analysis. $$$j$$$ increases $$$n$$$ times, each times increase at most $$$1$$$ (in fact $$$0$$$ or $$$1$$$, $$$0$$$ if s[i] != s[j]
and $$$1$$$ if s[i] == s[j]
). Therefore, $$$j$$$ increases at most $$$n$$$. Since $$$j$$$ is never a negative number (you can prove it by induction on the next
array), the amount $$$j$$$ decreases $$$\leq$$$ the amount $$$j$$$ increases $$$\leq n$$$. Each times $$$j$$$ decreases at least $$$1$$$ (as $$$next[j-1] \leq j-1$$$), so $$$j$$$ decreases at most $$$n$$$ times. To sum all, $$$j$$$ increases and decreases at most $$$2n$$$ times, which is $$$O(n)$$$.
But there is a trap! We can calculate the whole next
array in $$$O(n)$$$, but for a suffix of length $$$k$$$, calculating the items of the next
array that corresponds with that suffix is not $$$O(k)$$$!!! In fact, the worst complexity for every suffix, even a single character, could also be $$$O(n)$$$! In fact the example is very easy to construct: aaaaa....az
, the next array is:
$$$[0, 1, ..., num(a)-1, 0]$$$.
For $$$0 \leq i \leq num(a)$$$, the line j = next[j-1]
executes $$$0$$$ times, and when the index comes to $$$z$$$, j = next[j-1]
would execute $$$num(a)$$$ times! That means, the computational burden on the suffix could also be $$$O(n)$$$!
The GrandLisboa earns $$$100$$$ million from $$$10$$$ gamblers, it may earn $$$90$$$ million from you and $$$10$$$ million from another $$$9$$$ gamblers, that is amortization! Let's come back to this problem. When you append each suffix $$$t$$$ to the original string $$$s$$$, the complexity to handle the suffix is not $$$O(|t|)$$$, in fact the worst case is $$$O(|s|)$$$, and according to the analysis above, such suffix is very easy to construct!
Part 3: Why don't us generalize this idea to the original KMP algorithm?$