UPD:Thanks to comment from arvindf232 and pajenegod,I rewrote parts of this post to improve clarity with the help of Chatgpt. I hope it reads more smoothly now.

Preface

In this blog, I want to share Border Theory, a useful and elegant insight about string borders that often appears in competitive programming. Although this idea deserves to be as well-known as the KMP algorithm, many people today aren’t aware of it or how to use it to speed up their solutions.

Main Theoretical Result of Border Theory

Main Result

For a string $$$S$$$, a border is a substring that is both a prefix of $$$S$$$ and a suffix of $$$S$$$.

For a string $$$S$$$, we define that:

$$$ \{b_i\}_{i=1}^n$$$

is the sequence of all lengths of the border of $$$S$$$, sorted in increasing order.

According to the theory,it's garenteed that for all possible $$$S$$$, $$${B_i}_{i = 1}^n$$$ can be partitioned into $$$\operatorname{O}(\log |S|)$$$ continuous subsequences where each subsequence forms an arthmetic progressions.

Proof

First of all,if a string $$$T$$$ is a border of $$$S$$$ and $$$2|T| \ge |S|$$$, Then all positions where $$$T$$$ occurs in $$$S$$$ form an arthmetic progression.

proof

Second, All border lengths satisfying $$$2 \cdot length \ge |S|$$$ also appear in an arithmetic sequence:

proof

Using the two observations, we can show that for every two border length $$$x,y(x \lt y)$$$ do not lie in the same arithmetic subsequence, then $$$2x \lt y$$$, making the theory right.

Practical Use: The Series-Link (shlink) Trick

A practical formulation of this theory is the series link (often written shlink), which groups borders into arithmetic sequences so they can be processed efficiently.

We compute shlink[i] as the start of the next arithmetic progression. (Here link[i] is the usual border/failure link in KMP-style structures, len[i] is the length of the represented substring, and diff[i] is the differance of length between i and link[i].) Now $$$shlink_i$$$ can be calculated like this:

//define shlink[i]
//define link[i] as the maxinum border of i like fail pointers in KMP or PAM
//define len[i] as the length
//define diff[i] as the differance of length between i and link[i]
shlink[i] = link[i];
diff[i] = len[i] - len[link[i]];
if(diff[i] == diff[link[i]])
    shlink[i] = shlink[link[i]];

When we tries to analyze on borders,we found that:

When analyzing borders through these links, we observe: within an arithmetic progression of border lengths for a prefix of length $$$i$$$, the difference between consecutive border values (shifted by $$$diff_i$$$) corresponds to a border of length $$$i - diff_i$$$ and the corresponding occurrence position $$$i - len_{shlink_i} - diff_i$$$. This allows us to jump across long chains of borders in limited steps.

One example

Link

Given up to $$$5$$$ test cases,each with a string $$$S(1 \le |S| \le 5 \times 10 ^ 5)$$$ and an integer $$$w(1 \le w \le 10 ^ {18})$$$, when we have a string $$$T$$$ we can construct a new string $$$Y$$$ in the two ways: - Append the full string $$$T$$$ and $$$S$$$ and form a string with length $$$|T| + |S|$$$; - If a suffix $$$A$$$ in $$$T$$$ is also a prefix of $$$S$$$, you can overlap them and construct a string of length $$$|T| - |A| + |S|$$$.

Find the number of distinct length of strings in the range $$$[1,w]$$$ that can result from the process.

solution

code

#include<bits/stdc++.h>
using namespace std;
int t,n,bor[500009],diff[500009],dt[500009],shlink[500009],q[500009],head,tail;
long long w,dp1[500009],dp2[500009];
int mx[500009];
const long long INF = 1998244353999911659ll;
char c[500009];
int gcd(int x,int y){
	return y == 0 ? x : gcd(y,x % y);
}
int main(){
	scanf("%d",&t);
	while(t--){
		scanf("%d %lld",&n,&w);
		scanf(" %s",c + 1);
		bor[1] = 0;
		for(int i = 2; i <= n; i ++){
			int t = bor[i - 1];
			while(t != 0 && c[i] != c[t + 1])
				t = bor[t];
			if(c[i] == c[t + 1])
				t ++;
			bor[i] = t;
		}
		for(int i = 1; i <= n; i ++){
			diff[i] = i - bor[i];
			shlink[i] = bor[i];
			if(diff[i] == diff[bor[i]])
				shlink[i] = shlink[bor[i]];
		}
		//for(int i = n; i > 0; i = bor[i])
		//	printf("%d\n",bor[i]);
		//puts("OK");
		dp1[0] = n;
		for(int i = 1; i < n; i ++){
			dp1[i] = INF;
		}
		int lst = n;
		for(int i = bor[n]; i != 0; i = shlink[i]){
			int nw = n - i;
			int ls = (i - shlink[i]) / diff[i] - 1;
			for(int j = 0; j < nw; j ++){
				dp2[j] = INF;
			}
			int fp = gcd(n - i,lst);
			for(int j = 0; j < fp; j ++){
				mx[j] = n + 1;
				dp2[n + 1] = INF;
			}
			for(int j = 0; j < lst; j ++){
				dp2[dp1[j] % nw] = min(dp2[dp1[j] % nw],dp1[j]);
				if(dp2[mx[dp1[j] % fp]] > dp2[dp1[j] % nw])
					mx[dp1[j] % fp] = dp1[j] % nw;
			}
			
			for(int j = 0; j < fp; j ++){
				long long mn = dp2[mx[j]];
				if(mx[j] == n + 1)
					continue;
				for(int k = (mx[j] + lst) % nw; k != mx[j]; k = (k + lst) % nw){
					dt[k] = -1;
					mn = min(mn + lst,dp2[k]);
					dp2[k] = mn;
				}
			}
		//	printf("%d\n",i);
			fp = gcd(nw,diff[i]);
			for(int j = 0; j < fp; j ++){
				mx[j] = n + 1;
			}
				dp1[n + 1] = INF;
			for(int j = 0; j < nw; j ++){
				dt[j] = 0;
				dp1[j] = dp2[j];
				if(dp1[j] < dp1[mx[j % fp]])
					mx[j % fp] = j;
			}
			for(int j = 0; j < fp; j ++){
				head = 1,tail = 0;
				for(int k = mx[j],l = 0; l < nw / fp; l ++,k = (k + diff[i]) % nw){
					dt[k] = l;
					while(head <= tail && l - dt[q[head]] > ls)
						head ++;
					if(head <= tail)
						dp1[k] = min(dp1[k],dp1[q[head]] + 1ll * l * diff[i] - 1ll * dt[q[head]] * diff[i] + nw);
					while(head <= tail && dp1[k] - 1ll * l * diff[i] <= dp1[q[tail]] - 1ll * dt[q[tail]] * diff[i])
						tail --;
					q[++tail] = k;
					//printf("%d %lld\n",k,dp1[k]);
				}
			}
			lst = nw;
		}
		long long ans = 0;
		for(int i = 0; i < lst; i ++){
			//printf("%d %lld\n",i,dp1[i]);
			if(dp1[i] <= w)
				ans = ans + (w - dp1[i]) / lst + 1;
		}
		printf("%lld\n",ans);
	}
}

Extention to palindromic suffixes

A string $$$s$$$ is called a palindrome if it equals its reverse. A palindromic suffix of a string is a suffix that is also a palindrome.

As we found that The borders of palindormes are also the palindromic suffixes of a palindrome, then we can say that for a string $$$S$$$, the sequence of the lengths of all palindromic suffixes in increasing order(which can be computed using PAM) can be partitioned into $$$\operatorname{O}(\log |S|)$$$ continuous arthmetic progressions. We can also say that the sequence of the lengths of all even-length palindromic suffixes in increasing order can be partitioned into $$$\operatorname{O}(\log |S|)$$$ continuous arthmetic progressions.

Tasks

In 932G - Palindrome Partition we can construct a string $$$T$$$ by:

$$$t_{2k - 1} = s_{k}$$$
$$$t_{2k} = s_{n - k + 1}$$$

Then the task turns to the number of ways to divide $$$T$$$ into several even-length palindromes, it's solvable for dynamic planning as $$$dp_i$$$ is the answer when the string is the prefix of length $$$i$$$ in the string $$$T$$$, with series link to speed it up.

code

#include<bits/stdc++.h>
using namespace std;
int n,b[1000009],tp[1000009],to[1000009][26],len[1000009],cnt,dp[1000009],f[1000009],shlink[1000009],diff[1000009];
char c[1000009],s[1000009];
const int mod = 1000000007;
int main(){
	scanf(" %s",c);
	n = strlen(c);
	for(int i = 0; (i + 1) << 1 <= n; i ++){
		s[i << 1 | 1] = c[i]; 
	}
	for(int i = n - 1; (i + 1) << 1 > n; i --){
		s[(n - i) << 1] = c[i];
	}
	//printf("%s\n",s + 1);
	for(int i = 2; i <= n; i ++){
		int t = tp[i - 1];
		while(t != 0 && s[i] != s[i - 1 - len[t]])
			t = b[t];
		if(s[i] == s[i - 1 - len[t]] && !to[t][s[i] - 'a']){
			cnt ++;
			//printf("%d\n",cnt);
			to[t][s[i] - 'a'] = cnt;
			len[cnt] = len[t] + 2;
			int st = b[t];
			while(st != 0 && s[i] != s[i - 1 - len[st]])
				st = b[st];
			if(s[i] == s[i - 1 - len[st]] && len[cnt] != 2)
				st = to[st][s[i] - 'a'];
			b[cnt] = st;
			t = cnt;
		}
		else if(s[i] == s[i - 1 - len[t]])
			t = to[t][s[i] - 'a'];
		tp[i] = t;
		//printf("%d %d %d\n",i,len[tp[i]],len[b[tp[i]]]);
	}
	for(int i = 1; i <= cnt; i ++){
		diff[i] = len[i] - len[b[i]];
		shlink[i] = b[i];
		if(diff[i] == diff[b[i]])
			shlink[i] = shlink[b[i]];
	}
	dp[0] = 1;
	for(int i = 2; i <= n; i += 2){
		if(tp[i] == 0)
			continue;
		int p = tp[i];
		//dp[i] = dp[i - tp[i]];
		while(p != 0){
			f[p] = ((shlink[p] != b[p]) * f[b[p]] + dp[i - len[shlink[p]] - diff[p]]) % mod;
			dp[i] = (dp[i] + f[p]) % mod;
			//printf("%d %d %d %d\n",len[p],len[shlink[p]] + diff[p],f[p],f[b[p]]);
			p = shlink[p];
		}
		//printf("%d %d\n",i,dp[i]);
	}
	printf("%d\n",dp[n]);
}

These are similar to the solution to 906E - Reverses,which constructs a string $$$T$$$ in this way:

$$$t_{2k - 1} = x_k$$$
$$$t_{2k} = y_k$$$

Then it's solvable for dynamic programming to find a partition of the string into even-length palindromes, minimizing the total count while allowing palindromes of length $$$2$$$ but excluding them from the count.

code

#include<bits/stdc++.h>
using namespace std;
int n,m,shlink[1000009],b[1000009],to[1000009][26],tp[1000009],len[1000009],diff[1000009],cnt;
char s[500009],t[500009],ct[500009];
int dp[1000009],las[1000009],mn[1000009],flas[1000009];
int main(){
	scanf(" %s %s",s,t);
	n = strlen(s) << 1;
	for(int i = 0; s[i] != '\0'; i ++)
		ct[i << 1 | 1] = s[i],ct[(i << 1) + 2] = t[i];
	for(int i = 2; i <= n; i ++){
		int o = tp[i - 1];
		while(o != 0 && ct[i] != ct[i - len[o] - 1])
			o = b[o];
		if(ct[i] == ct[i - len[o] - 1]){
			if(to[o][ct[i] - 'a'] == 0){
				cnt ++;
				to[o][ct[i] - 'a'] = cnt;
				len[cnt] = len[o] + 2;
				int u = b[o];
				while(u != 0 && ct[i] != ct[i - len[u] - 1])
					u = b[u];
				if(to[u][ct[i] - 'a'] != cnt && ct[i] == ct[i - len[u] - 1])
					b[cnt] = to[u][ct[i] - 'a'];
				shlink[cnt] = b[cnt];
				diff[cnt] = len[cnt] - len[b[cnt]];
				if(diff[cnt] == diff[b[cnt]])
					shlink[cnt] = shlink[b[cnt]];
			} 
			o = to[o][ct[i] - 'a'];
		}
		tp[i] = o;
		//printf("%d %d %d\n",i,len[tp[i]],len[b[tp[i]]]);
	}
	mn[0] = (1 << 24);
	for(int i = 2; i <= n; i += 2){
		if(ct[i] == ct[i - 1]){
			dp[i] = dp[i - 2];
			las[i] = i - 2;
		}
		else{
			dp[i] = (1 << 24);
		}
		int p = tp[i];
		if(len[p] != 0){
			do{
				mn[p] = dp[i - len[shlink[p]] - diff[p]] + 1;
				flas[p] = i - len[shlink[p]] - diff[p];
				if(b[p] != shlink[p]){
					mn[p] = min(mn[b[p]],mn[p]); 
					if(mn[p] == mn[b[p]])
						flas[p] = flas[b[p]];
				}
				dp[i] = min(dp[i],mn[p]);
				if(dp[i] == mn[p])
					las[i] = flas[p];
				p = shlink[p];
			}while(len[p] != 0);
		}
		//printf("%d %d %d\n",i,dp[i],las[i]);
	}
	if((dp[n] >> 24) & 1)
		puts("-1");
	else{
		printf("%d\n",dp[n]);
		int o = n;
		while(o != 0){
			if(o - las[o] != 2)
				printf("%d %d\n",(las[o] >> 1) + 1,o >> 1);
			o = las[o];
		}
	}
}

Summary

Though bruteforce on borders is slow, we can approach the algorithm using border theory,that shows that border lengths cluster into structured arithmetic sequences bounded in number by $$$\operatorname{O}(\log |S|)$$$ to make it much faster.

Comments (34)

Write comment?

OtterZ

5 months ago, hide # |

Auto comment: topic has been updated by OtterZ (previous revision, new revision, compare).

→ Reply

%%%%

%%%

%%%%%%

膜郑了

← Rev. 2 →

+42

This is unreadable. Please use an LLM to translate

EDIT: That aside, after actually understanding what you meant (with an LLM) , this seemed exceptionally useful

5 months ago, hide # ^ |

What do you mean? Translate to Chinese or just do a summary? Are there any more details that have to be added? Thanks for reminding me to develop instead of something unresponsible.

Ensure that everything is correctly and clearly defined. Also there are lots of typos, grammatically awkward sentences. A little bit is fine but this time there are so many that it is very hard to understand what you meant.

If you have written it in another langauge, simply translate it. If you are far more comfortable writing in anotheer language, write in that langauge then translate. Or, ask LLM to fix up the entire post would be much better.

I've tried to develop once. Is the blog better now?

pajenegod

The short answer is, no.

I'm sorry to say this, but the level of the english language is honestly a mess. There are obvious things like the word "conclusion" being spelled "conclution". But the bigger issue is that the sentences are so weirdly written that I don't get what they are trying to say. For example, I still don't get what you mean by the word "bruteforce" in the title.

My advice would be, just as arvind recommended, to copy paste your text through some LLM (like chatgpt or deepseek etc...). That would take you like 1 minute, and it would probably do a pretty good job. And it would make your blog 100 times easier to read.

Thanks for you to remind me,I'm taking action now.

No, the blog is still incomprehensible. It doesn't look like much has changed. You still have conclution.

Here is an example where I asked chatgpt to fix the blog https://chatgpt.com/share/69469388-38b0-8012-ac00-334a11677e8a . Not only did it fix the language, it also fixed some issues in your proofs. Especially your first proof.