How to calculate the number of the string S of length N, which does not contain given string T as a substring?
length(S) ~ 10000
I would like to know all the approaches for the following 2 constraints :
- length(T) ~ 3 -> I have some DP idea but would appreciate if you can elaborate it more
- length(T) ~ 100
- length(T) ~ 10000
EDIT: What if we want to find the number of string S which contains given string T as a substring at most K times?
Please let me know all the available approaches.
2) DP[i][j] denoting current string has length i and its suffix shares j characters with prefix of T.
DP[i][j] updates from DP[i-1][j-1] for j > 0.
DP[i][0] updates from DP[i-1][k] for each k.
DP[i][j] can also be updated from DP[i-1][k] where k>j (preprocessing will be required to find)
nishant403 I think this solution can solve task 3 as well. What's the problem with that? And also I remember now I've solved a similar problem on cf as well.
There are total N*N states and each state might require O(N) operations to update.
In case like abacacacac, DP[i][2] updates from DP[i][3], DP[i][5] etc. (I dont know of such a hard case which updates O(N) states but I think it can exist)
It is O(26*N*N). At every state, try to append a letter to the current string. To find the correct $$$j$$$ value, you can use prefix function/kmp automaton.
Yes, you are right. Though 26*N*N cant pass N = 10000 but It seems that 26*N*N never occurs. What will be the actual worst case ??
nishant403 That 26 is for what will be the next character, for all the states, you have to consider all the possible next characters. And we are using KMP/Z algorithm to precompute the next possible value of J' for dp[i][j] if the j + 1 the character does not match with next character we are trying to append with
Consider the code Here
The part where I construct the matrix, I think it will take very less time than actual time complexity because most of the transitions will share 0 characters to prefix.
3) Let DP[i] denote number of strings of length i such that T ends at i for the first time.
(At last we can add DP[i]*26^(N-i) to get all strings which containt string T and subtract it from 26^N. )
Initialize DP[i] to be 26^(i-len(T)) for i >= len(T).
Now we subtract from it all cases where string can end before i. (Similar to this )
Note that there may be indices j > i — len(T) from which we can update (preprocess needed)
nishant403 I have high doubts regarding the correctness of this solution. Let's consider this problem to one given in the link, Then we have got a straight 1xN single row matrix, with all the blocked cells at (1,i) where i >= len(T). Now, the problem is we are blocking all the blocked cells, but there might be cases when some blocked cells can be available cells when few are already blocked before that..
I don't understand the case you are talking about.
Here is my code. Input format is same as lightOJ's question (without testcases).
I got AC in lightOJ using this approach for n < 1000.
Time complexity : O(N*N)
I have solved some similar problem sometime ago and would like to share some interesting approaches to this problem.
1) Solution 1: Complexity- O(|S|*|T|*26)
DP[i][j] -> currently set the first 'i' elements of S and 'j' is the length of the longest suffix of S which is a prefix of T. Transition would require you to iterate over all the characters and calculate the updated longest suffix. These transitions can be preprocessed in O(|T|*26) using the prefix function and stored in another dp, TRANS[j][c] where j is the current suffix, c is the character you add after the current suffix and TRANS[j][c] would store the length of longest updated suffix. Note that you should not perform the transition where TRANS[j][c] = |T|. For the bonus part, you can maintain another state which will store the number of times you visit j = |T|.
2) Solution 2: O(|T|^3*log(|S|))
We will create a transition matrix for T and use matrix expo over the length of S to find the answer. Construct a matrix M, where M[i][j] will store the number of characters s.t. the longest suffix changes from i to j on adding the character (again use prefix function on T) and skip those transitions where j = |T| or i = |T|. If you exponentiate this to the power of N, you will realize that the answer is sum of M[0][j] where 0 <= j < |T|.
3) What if you are given multiple patterns ( say P1, P2, ..., Pk) and you have to calculate the number of strings S of length N s.t. none of the patterns occur in the string ?
Solution: We will use aho corasick to solve this problem in O(|N|*|sum of pattern length|). Build aho corasick over all the patterns and mark those nodes where atleast one pattern ends. We will consider these nodes as the states and make transitions only when we are not ending at any marked node. The new dp state becomes dp[i][j] where i is the length of the current string and j is the state in the aho corasick. Transitions will be similar as in Solution 1. You can also use Solution 2 and solve the problem in O(|sum of pattern length|^3*log(|S|)) too.
Here are some problems which uses the last approach : Problem Problem
Here are two more problems. First one is the exact one that is stated in this blog and uses the second approach you described. The second one is from topcoder and uses your third approach I guess.
Lightoj 1268 — Unlucky Strings
TopCoder — Suspicious Strings
EDIT: What if we want to find the number of string S which contains given string T as a substring at most K times?
Again DP[i][len(Alphabet)]. You can make a transition function via KMP, and it will be len(T) * (K + 1), (each len(T) block is offset by K * (T)). You don't want to reach the final state, so if you have string abc, and you want to have it max 2 times you have this — [0,0,0,3,3,3,6,6,6], which means if you already have matched 'abc' once, each "invalid" character brings you back to state 3, not 0.
zed_b Your explanation is a bit unclear. But I got your point. whenever we are matching at position j and if the character does not match, we reach (j/k) * j the position instead of 0 after applying KMP. maybe that would work. I think that is same as dp[N][K][len(T)]