[Tutorial] Dictionary of Basic Factors — O(1) String Matching

#	User	Rating
1	tourist	4009
2	jiangly	3823
3	Benq	3738
4	Radewoosh	3633
5	jqdai0815	3620
6	orzdevinwang	3529
7	ecnerwala	3446
8	Um_nik	3396
9	ksun48	3390
10	gamegame	3386

#	User	Contrib.
1	cry	167
2	maomao90	163
2	Um_nik	163
4	atcoder_official	161
5	adamant	159
6	-is-this-fft-	158
7	awoo	157
8	TheScrasse	154
9	nor	153
9	Dominater069	153

Hello, codeforces.
I think this technique is important, and easy to code, although I haven't found much about it, so I've decided to make a blog explaining it.This technique allows to compare substrings lexicographically in $$$O (1)$$$ with a preprocessing in $$$O (NlogN)$$$.
As this is my first post on codeforces, please let me know if there are any mistakes.

Prerequisites:

Basic strings knowledge.
Sparse Table(Optional).

What is a Dictionary of Basic Factors?

As in the sparse table we will have a matrix of size $$$N\log{N}$$$ called $$$DFB$$$, where $$$DFB[i][j]$$$ tells us the position of the string $$$S[i ... i + 2^j-1]$$$ (the first time it appears) among all the strings of size $$$2^j$$$ ordered lexicographically.

So, how do we build the matrix? Let's start with the lowest level, for powers of size $$$2^0$$$, we simply put the position of the character $$$S[i]$$$ in the dictionary (1 for 'A', 2 for 'B' ... and so on).

For the following $$$ DFB[i][j] $$$ we can save a pair of numbers that are $$$(DFB[i][j-1], DFB[i+2^{j-1}][j-1]) $$$

Now we replace the pairs by their position (the first time the pair appears) on the ordered array of those pairs. This can be done with Counting Sort in $$$O(N)$$$, otherwise with a normal sort in $$$O(NlogN)$$$ (But this will slow down the algorithm).

We do that for all powers of two and we will have the complete matrix.

Handling queries:

Suppose we want to make a query that is "Say which of the two substrings $$$S[l_{1}...r_{1}]$$$ and $$$S[l_{2}...r_{2}]$$$ is smaller lexicographically?".
First of all, let's analyze that if the substrings have different sizes, we can compare only the prefixes of size $$$M$$$ where $$$M$$$ is the minimum size between the two substrings, if both prefixes are equal then the smallest lexicographically is the substring of smaller size.
As in the Sparse Table, suppose $$$ k = \ log {(r-l)} $$$, then we can represent the substring $$$ S [l ... r] $$$ as the pair $$$ ( DBF [l] [k], DFB [r-2^k+1] [k] ) $$$. Now to compare two substrings of equal size we simply lexicographically compare their pairs in $$$O(1)$$$.
For example, suppose we have the string "ABACABA" and we want to know which string is lexicographically smaller between $$$ S [0 ... 5] $$$ and $$$ S [1 ... 6] $$$ ("ABACAB" and "BACABA").Since $$$\log{(5-0)}=2$$$, the pairs to compare would be: $$$(DBF [0] [2], DBF [2] [2])$$$ and $$$(DBF [1] [2], DBF [3] [2])$$$, which would be $$$(1,2)$$$ and $$$(2,4)$$$ therefore the first substring is lexicographically smaller.

Proof:

Since we have that $$$DBF [i] [j]$$$ is the position of $$$ S [i ... i + 2 ^ j + 1] $$$ between all substrings of size $$$ 2 ^ j $$$. If we want to compare two substrings, which would be represented as pairs, and we compare the first element in the pairs, we are comparing prefixes of size $$$ 2 ^ k $$$ of the substrings (where $$$k$$$ is the logarithm of the size of the substring). If they are the same, we compare the second elements of the pairs, which would be the same as comparing suffixes of size $$$ 2 ^ k $$$ of the substrings.

Thanks for reading, I hope this article will be useful. Any questions or suggestions let me know in the comments.

Rev.	By	When	Δ	Comment
en13	Leonardo16	2021-06-29 00:24:36	22
en12	Leonardo16	2021-06-28 21:05:23	111
en11	Leonardo16	2021-06-24 20:08:52	237	Tiny change: 'strings. \nThis t' -> 'strings. \nThis t'
en10	Leonardo16	2021-06-23 02:53:32	4	Tiny change: '[2], DBF [2] [2])$, w' -> '[2], DBF [3] [2])$, w' (published)
en9	Leonardo16	2021-06-23 02:39:03	663	Tiny change: ' in $O(N)$ (I'll explain it more deeply later), otherwis' -> ' in $O(N)$, otherwis'
en8	Leonardo16	2021-06-23 02:23:48	4	Tiny change: 'hose pairs, this can be' -> 'hose pairs. This can be'
en7	Leonardo16	2021-06-23 02:22:01	8	Tiny change: 'irst time it appears) ' -> 'irst time the pair appears) '
en6	Leonardo16	2021-06-23 02:20:42	44
en5	Leonardo16	2021-06-23 01:41:15	810	Tiny change: 'he string ABACABA and we wa' -> 'he string "ABACABA" and we wa'
en4	Leonardo16	2021-06-23 01:03:39	602	Tiny change: 'l sort in $N\log{N}$(But this ' -> 'l sort in O(NlogN) (But this '
en3	Leonardo16	2021-06-23 00:33:42	73	Tiny change: 'ize $N\logN$ called ' -> 'ize $N\log N$ called '
en2	Leonardo16	2021-06-23 00:30:48	153
en1	Leonardo16	2021-06-23 00:27:13	1289	Initial revision (saved to drafts)

Prerequisites:

What is a Dictionary of Basic Factors?

Handling queries:

Proof:

History