A note on CF1721E (EDU134E) Prefix Function Queries

#	User	Rating
1	tourist	4009
2	jiangly	3823
3	Benq	3738
4	Radewoosh	3633
5	jqdai0815	3620
6	orzdevinwang	3529
7	ecnerwala	3446
8	Um_nik	3396
9	ksun48	3390
10	gamegame	3386

#	User	Contrib.
1	cry	167
2	maomao90	163
2	Um_nik	163
4	atcoder_official	161
5	adamant	159
6	-is-this-fft-	158
7	awoo	157
8	TheScrasse	154
9	nor	153
9	Dominater069	153

Problem Link

Submission Link

Part 1: Calculate the Prefix function (next array) in $$$O(n)$$$

We can calculate the prefix function (next array) in linear time using the following code:

//next is a 0-indexed array that is initialized to zero.
//s is a 0-indexed string
for(int i = 1, j = 0; i < n; ++i){
    while(j && s[i] != s[j]) j = next[j-1];
    if(s[i] == s[j]) j++;
    next[i] = j;
}

While is it $$$O(n)$$$? The key idea is the amortized analysis. $$$j$$$ increases $$$n$$$ times, each times increase at most $$$1$$$ (in fact $$$0$$$ or $$$1$$$, $$$0$$$ if s[i] != s[j] and $$$1$$$ if s[i] == s[j]). Therefore, $$$j$$$ increases at most $$$n$$$. Since $$$j$$$ is never a negative number (you can prove it by induction on the next array), the amount $$$j$$$ decreases $$$\leq$$$ the amount $$$j$$$ increases $$$\leq n$$$. Each times $$$j$$$ decreases at least $$$1$$$ (as $$$next[j-1] \leq j-1$$$), so $$$j$$$ decreases at most $$$n$$$ times. To sum all, $$$j$$$ increases and decreases at most $$$2n$$$ times, which is $$$O(n)$$$.

But there is a trap! We can calculate the whole next array in $$$O(n)$$$, but for a suffix of length $$$k$$$, calculating the items of the next array that corresponds with that suffix is not $$$O(k)$$$!!! In fact, the worst complexity for every suffix, even a single character, could also be $$$O(n)$$$! In fact the example is very easy to construct: aaaaa....az, the next array is:

$$$[0, 1, ..., num(a)-1, 0]$$$.

For $$$0 \leq i \leq num(a)$$$, the line j = next[j-1] executes $$$0$$$ times, and when the index comes to $$$z$$$, j = next[j-1] would execute $$$num(a)$$$ times! That means, the computational burden on the suffix could also be $$$O(n)$$$!

The GrandLisboa earns $$$100$$$ million from $$$10$$$ gamblers, it may earn $$$90$$$ million from you and $$$10$$$ million from another $$$9$$$ gamblers, that is amortization! Let's come back to this problem. When you append each suffix $$$t$$$ to the original string $$$s$$$, the complexity to handle the suffix is not $$$O(|t|)$$$, in fact the worst case is $$$O(|s|)$$$, and according to the analysis above, such suffix is very easy to construct!

Part 2: A fix

We note that most next transition is unnecessary. For example,the $$$0$$$-indexed string $$$s$$$ is ababcababa and we are now checking the last character, which is a. it is not necessary to scan abab (length=$4$) at all, because the prefix abab (s[0:4]) is followed by c while the prefix abab (s[5:9]) is followed by a. So, scan the first two characters ab is enough! Inspired by this, we define a last array:

//last[i][c]: Among 0 <= k <= i such that [0...k] is a prefix of [0,...,i] and [i-k,...,i] is a suffix of [0,...,i], the max k such that s[k+1]==c
//For example, s = "abcabaabd", last[7]['d'] = 7, last[7][c] = 1.
//s[0:1] = "ab" is a prefix of s[0:7]("abcabaab").
//s[0:1] is also a suffix of s[0:7]
//and s[1+1] = s[2] = 'c'
//The transition function: last[i][c] = (c == s[i+1] ? (next[i] >= 1 ? last[next[i]-1][c] : -1) : i);

We can easily transfer to the proper place in $$$O(1)$$$ time. Deducing last[i][c] from last[i-1][*] costs $$$O(|\alpha|)$$$ time ($$$\alpha$$$ is the alphabet), and such transition only happens $$$O(n)$$$ times, so we can fix this issue in $$$O(n |\alpha|)$$$ time.

Part 3: Why don't us generalize this idea to the original KMP algorithm?

(1)The original $$$KMP$$$ algorithm is already linear and achieves the best theoretical complexity.

(2)The character set in the original $$$KMP$$$ algorithm is not restricted to the lower Latin letters. It can handle arbitrary character sets in linear time, not depending on the size of the character sets.

Rev.	By	When	Δ	Comment
en20	GrandLisboa	2023-03-25 10:39:02	4	Tiny change: '~~~~\n\nWhile is it $O(' -> '~~~~\n\nWhy is it $O('
en19	GrandLisboa	2023-03-25 07:26:22	4	Tiny change: 'we define a last arra' -> 'we define the last arra'
en18	GrandLisboa	2023-03-24 19:57:32	0	(published)
en17	GrandLisboa	2023-03-24 19:57:24	2	(saved to drafts)
en16	GrandLisboa	2023-03-24 19:41:51	0	(published)
en15	GrandLisboa	2023-03-24 19:41:40	2	Tiny change: ' `last[i][c]` from `l' -> ' `last[i][*]` from `l'
en14	GrandLisboa	2023-03-24 19:41:21	881
en13	GrandLisboa	2023-03-24 19:35:27	1158
en12	GrandLisboa	2023-03-24 19:28:51	1	Tiny change: 'algorithm?$\n\n(1)T' -> 'algorithm?\n\n(1)T'
en11	GrandLisboa	2023-03-24 19:27:43	383
en10	GrandLisboa	2023-03-24 19:24:01	343
en9	GrandLisboa	2023-03-24 19:21:42	200
en8	GrandLisboa	2023-03-24 19:20:11	228
en7	GrandLisboa	2023-03-24 19:13:12	407
en6	GrandLisboa	2023-03-24 19:10:12	4
en5	GrandLisboa	2023-03-24 19:09:51	178
en4	GrandLisboa	2023-03-24 19:07:53	382
en3	GrandLisboa	2023-03-24 19:01:23	234
en2	GrandLisboa	2023-03-24 18:58:00	111
en1	GrandLisboa	2023-03-24 18:55:37	239	Initial revision (saved to drafts)

History