Finding lexicographically minimal merge of two strings in linear time

№	Пользователь	Рейтинг
1	tourist	4009
2	jiangly	3823
3	Benq	3738
4	Radewoosh	3633
5	jqdai0815	3620
6	orzdevinwang	3529
7	ecnerwala	3446
8	Um_nik	3396
9	ksun48	3390
10	gamegame	3386

№	Пользователь	Вклад
1	cry	167
2	Um_nik	163
3	maomao90	162
3	atcoder_official	162
5	adamant	159
6	-is-this-fft-	158
7	awoo	157
8	TheScrasse	154
9	Dominater069	153
9	nor	153

After reading the editorial for RCC Elimination round problem E, I thought of an easier problem of merging two strings such that the result is lexicographically minimal. Formally, a merge of two strings a and b is a string s of length |a| + |b| such that there exist two strictly increasing sequences of indices i₁, i₂, ..., i_|a| and j₁, j₂, ..., j_|b| such that a = s_i₁s_i₂... s_{i_|a|}, b = s_j₁s_j₂... s_{j_|b|} and each index in s appears exactly once in i₁, ..., i_|a|, j₁, ..., j_|b|.

The above mentioned editorial provides an algorithm for solving this problem that works in $\text{[math]}$ time and uses hashes. Actually, this problem can be solved in linear time. The $\text{[math]}$ solution works roughly like this: maintain current position p_a in a and p_b in b. On each step, lexicographically compare the suffix of a starting at p_a with the suffix of b starting at p_b, and take a character from the suffix that is smaller (actually, for this to work, it is necessary to terminate each string with a character that is greater than any character in the strings, so that if one of the suffixes is a prefix of the other, the shorter suffix is considered larger, not smaller). The author proposes to compare the suffixes by using binary search and hashing, which takes $\text{[math]}$ time. However, this can be done in constant time.

Actually, this is a well known Longest Common Extension problem. One of the constant-time solutions is as follows: construct a suffix tree from the strings, then preprocess it using one of Lowest Common Ancestor algorithms that can answer LCA queries in constant time. It is easy to see that the lowest common ancestor of two leaves in a suffix tree that correspond to two suffixes can be used to find the length of the longest common prefix of those suffixes. From that, performing lexicographical comparison is easy.

It is possible to build and preprocess a suffix tree in linear time, so the overall running time is O(n), but the algorithm is quite complex. Does anyone know of a simpler algorithm with the (asymptotically) same running time?

Комментарии (8)

Показать архивные | Написать комментарий?

geniucos

8 лет назад, # |

I know that there is a way of sorting all suffixes of a given string in O(N). Using that it should work. I know about it as it is explained on some Romanian site bu I hadn't tried to understand. However, I can say just that it is easier than building a suffix tree (from the size of the article)

→ Ответить

shakil.ahamed

8 лет назад, # ^ |

Can you give a link to it? The only way to sort in O(N) is counting sort as far as I know. But how it can be implemented in string!

redocyz

← Rev. 2 →

TimonKnigge

+10

Google 'Suffix Array'.

data_h

Maybe I have a simple way to do it in O(NC), where C denote the alphabeta size.

Suppose the string s and t are something like

a = xxxxy b = xxxxxz

If x < y z, you may definitly take all x first,

if y < x < z, you may take the string with the smaller character after x.

if x > y z, and the number of x is different, you may take the shorter one. in case the number of x is equal, you should consider the same situation after the x in each string. But note that y and z are smaller than x, which means this recursive steps are bounded by the size of alphabeta.

In each case, what you want is the next different character and the number of same character start from some positions. By simple precalculating, you may do it in linear time.

eatmore

It's not that easy. Suppose that both strings start with something like zzzxyxyxyxy, where x < y < z. In this case, it will be necessary to scan through all these xy repetitions, and the number of these repetitions can be much bigger than the alphabet size.

You are right, I underestimate the difficulty in the third case :( .

rng_58

By the way, your problem was used in SRM, but here you can even do O(N²).

Блог пользователя eatmore