The task is to minimize the length of resulting string after repeatedly removing a given substring. Note that after removal, new occurrences of the substring may be formed.
Example:
Given string: aaababa
String to remove: aba
Possibilities:
aaababa -> aaab
aaababa -> aaba -> a
So in this case the answer is 1.
This looks like a standard problem but i am unable to solve it. An even tougher version of the problem appeared on: https://www.hackerrank.com/contests/apc/challenges/reducto where instead of a single string, a set of strings can be removed. Sadly, an editorial is not available.
I think the following will work:
Let q be the string to remove, and S be the original string.
Let canKill(i,j) be a function that says whether the entire substring from i to j of S can be removed.
To compute it, we will use a dynamic programming. Let s be the substring from i till j, and l be the length of that string (l = j - i). canKill(i,j) is true if and only if we can find a subsequence of s that is equal to q, such that all the characters that are not in that subsequence can be fully covered by a set of non-intersecting intervals, for each of which canKill is true.
We will use a dynamic programming for that. dx, y is whether we can find a subsequence of a prefix of s of length x, that is a prefix of q of length y. The transitions are
dx, y = dx - 1, y - 1 if qy - 1 = sx - 1
(we add another character from q to the end of the prefix)
and
dx, y = dx - z, y if canKill(i + x — z, i + x)
(we add another interval for which canKill is true to the end of the previx)
When you compute all the canKills, you can run another dynamic programming run, that will find the maximum subset of S that can be covered by non-intersecting intervals, for which canKill is true.
Thanks a lot!
Initially i was unable to understand how you calculate canKill(i,j) from its corresponding dp value — dlen(s),len(q) . Suppose s1=aabbcc , s2=ababcc and q=abc. For both s1 and s2, q is a subsequence so dlen(s),len(q) is true for both but canKill is only true for s2.
However, if we consider d[x][y] as true if and only if sx (prefix of s of length x) can be completely removed by using qy atmost once and the complete string q multiple times, then canKill(i,j) becomes equal to d[len(s)][len(q)] as it denotes whether the whole substring s can be completely removed by using string q multiple times.
Suppose s=ababcabcc and q=abc. So, at this stage canKill(2,4), canKill(5,7), canKill(2,7) is known to be true.
Now for given s the dp looks like (the transitions are same, only rows and columns are interchanged):
Since dp[2][8] is true, so canKill(s) is also true. Is this what you meant or i misunderstood your approach?