Abridged Problem Statement : You have a list of N distinct words, and you want to print K of them. You have a very basic text editor : It supports 3 types of operations: typing a letter, deleting the previous letter, and printing the current document. You need to leave the document empty after printing K words. What's the minimum number of operations required to get the job done?
Constraints : 1 ≤ K ≤ N ≤ 300 The total length of all N words in each set will be at most 100, 000 characters.
Solution :
Name the words s1, s2, ..., sN. Let s0 = "". We initially have s0 on the document. After some operations, we want to print k of s1, s2, ..., sN and change the document to s0. For some pair of string si, sj, how many operations do we need to print sj if we have a document with si? Let p be the length of the Longest Common Prefix of si and sj. We can compute this naively in O(min(|si|, |sj|)). Then, we need to delete |si| - p characters, type |sj| - p characters, and print it (1 operation), for a total of |si| + |sj| - 2p + 1 operations. We can precalculate this value for each pair of words si, sj in a short time.
Now, suppose we have chosen our K words s1, s2, ..., sK. In which order do we print them in order to use a minimum amount of operations? We claim that in fact, if we print the strings in lexicographical order, we will use the minimum amount of operations. Suppose not, let s1, s2, ..., sK be arranged in lexicographical order. Suppose we print the words in the sequence s1, s2, ..., si - 1, sj, si + 1, ..., sj - 1, si, sj + 1, ..., sK, with i < j. Let a1, a2, ..., K be the length of word s1, s2, ..., sK respectively and denote the length of the Longest Common Prefix of sa and sb by L(a, b). Then, it can be easily seen that the total number of operations used is 2(a1 + a2 + ... + aK) - 2(L(1, 2) + L(2, 3) + ... + L(i - 1, j) + L(j, i + 1) + ... + L(j - 1, i) + L(i, j + 1) + ... + L(K - 1, K)). Compare this to when the words are printed in lexicographical order. Firstly, if j > i + 1, we just need to compare L(i - 1, j) + L(j, i + 1) + L(j - 1, i) + L(i, j + 1) and L(i - 1, i) + L(i, i + 1) + L(j - 1, j) + L(j, j + 1). We want the former to be at most the latter.