A note on CF1721E (EDU134E) Prefix Function Queries

Revision en12, by GrandLisboa, 2023-03-24 19:28:51

Problem Link

Submission Link

Part 1: Calculate the Prefix function (next array) in $$$O(n)$$$

We can calculate the prefix function (next array) in linear time using the following code:

//next is a 0-indexed array that is initialized to zero.
//s is a 0-indexed string
for(int i = 1, j = 0; i < n; ++i){
    while(j && s[i] != s[j]) j = next[j-1];
    if(s[i] == s[j]) j++;
    next[i] = j;
}

While is it $$$O(n)$$$? The key idea is the amortized analysis. $$$j$$$ increases $$$n$$$ times, each times increase at most $$$1$$$ (in fact $$$0$$$ or $$$1$$$, $$$0$$$ if s[i] != s[j] and $$$1$$$ if s[i] == s[j]). Therefore, $$$j$$$ increases at most $$$n$$$. Since $$$j$$$ is never a negative number (you can prove it by induction on the next array), the amount $$$j$$$ decreases $$$\leq$$$ the amount $$$j$$$ increases $$$\leq n$$$. Each times $$$j$$$ decreases at least $$$1$$$ (as $$$next[j-1] \leq j-1$$$), so $$$j$$$ decreases at most $$$n$$$ times. To sum all, $$$j$$$ increases and decreases at most $$$2n$$$ times, which is $$$O(n)$$$.

But there is a trap! We can calculate the whole next array in $$$O(n)$$$, but for a suffix of length $$$k$$$, calculating the items of the next array that corresponds with that suffix is not $$$O(k)$$$!!! In fact, the worst complexity for every suffix, even a single character, could also be $$$O(n)$$$! In fact the example is very easy to construct: aaaaa....az, the next array is:

$$$[0, 1, ..., num(a)-1, 0]$$$.

For $$$0 \leq i \leq num(a)$$$, the line j = next[j-1] executes $$$0$$$ times, and when the index comes to $$$z$$$, j = next[j-1] would execute $$$num(a)$$$ times! That means, the computational burden on the suffix could also be $$$O(n)$$$!

The GrandLisboa earns $$$100$$$ million from $$$10$$$ gamblers, it may earn $$$90$$$ million from you and $$$10$$$ million from another $$$9$$$ gamblers, that is amortization! Let's come back to this problem. When you append each suffix $$$t$$$ to the original string $$$s$$$, the complexity to handle the suffix is not $$$O(|t|)$$$, in fact the worst case is $$$O(|s|)$$$, and according to the analysis above, such suffix is very easy to construct!

Part 3: Why don't us generalize this idea to the original KMP algorithm?

(1)The original $$$KMP$$$ algorithm is already linear and achieves the best theoretical complexity.

(2)The character set in the original $$$KMP$$$ algorithm is not restricted to the lower Latin letters. It can handle arbitrary character sets in linear time, not depending on the size of the character sets.

Tags string, kmp, prefix function

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en20 English GrandLisboa 2023-03-25 10:39:02 4 Tiny change: '~~~~\n\nWhile is it $O(' -> '~~~~\n\nWhy is it $O('
en19 English GrandLisboa 2023-03-25 07:26:22 4 Tiny change: 'we define a last arra' -> 'we define the last arra'
en18 English GrandLisboa 2023-03-24 19:57:32 0 (published)
en17 English GrandLisboa 2023-03-24 19:57:24 2 (saved to drafts)
en16 English GrandLisboa 2023-03-24 19:41:51 0 (published)
en15 English GrandLisboa 2023-03-24 19:41:40 2 Tiny change: ' `last[i][c]` from `l' -> ' `last[i][*]` from `l'
en14 English GrandLisboa 2023-03-24 19:41:21 881
en13 English GrandLisboa 2023-03-24 19:35:27 1158
en12 English GrandLisboa 2023-03-24 19:28:51 1 Tiny change: 'algorithm?$**\n\n(1)T' -> 'algorithm?**\n\n(1)T'
en11 English GrandLisboa 2023-03-24 19:27:43 383
en10 English GrandLisboa 2023-03-24 19:24:01 343
en9 English GrandLisboa 2023-03-24 19:21:42 200
en8 English GrandLisboa 2023-03-24 19:20:11 228
en7 English GrandLisboa 2023-03-24 19:13:12 407
en6 English GrandLisboa 2023-03-24 19:10:12 4
en5 English GrandLisboa 2023-03-24 19:09:51 178
en4 English GrandLisboa 2023-03-24 19:07:53 382
en3 English GrandLisboa 2023-03-24 19:01:23 234
en2 English GrandLisboa 2023-03-24 18:58:00 111
en1 English GrandLisboa 2023-03-24 18:55:37 239 Initial revision (saved to drafts)