harsh-apcr's blog

By harsh-apcr, history, 14 months ago, In English

Can someone explain how to solve Required Substring problem in CSES : https://cses.fi/problemset/task/1112

Your task is to calculate the number of strings of length n having a given pattern of length m as their substring. All strings consist of characters A–Z.

My approach :

I was thinking of some dp approach and instead of counting the required quantity directly, I'd count the complement and subtract from $$$26^n$$$

But, I am stuck with this approach, I thought of the following recursion

$$$dp(i, j) = # of strings of length i not containing s, whose suffix of length i is a prefix of s (i.e. s_1\ldots s_j)$$$

$$$dp(i, j) = \sum_{j'\in S} dp(i-1, j') + dp(i-1, j-1)$$$, where $$$S$$$ = set of all $$$j'$$$ such that $$$s_1 \ldots s_{j'}$$$ has a prefix of length $$$j-1$$$ which is also its suffix

and then finally compute : $$$dp(i) = 26*dp(i-1)-dp(i-1,m-1)$$$ (supposed to count strings of length i not containing s)

but I cannot figure out how to instantiate the base case $$$dp(i, 0)$$$ for $$$i > m$$$ or $$$dp(i, 1)$$$

Any help is appreciated, thank you

  • Vote: I like it
  • 0
  • Vote: I do not like it

| Write comment?
»
10 months ago, # |
  Vote: I like it 0 Vote: I do not like it

You could follow this dp idea of finding the complement and then the answer would be 26^n — complement. So you make a dp(i,j), meaning how many string from i to n doesn't have the pattern, knowing that the previous j-1 letters already matched with the pattern. The transition would be trying any of the 26 uppercase leetters as the new letter in the index i, but if j == m -1 (m is the size of the pattern), that means you can't choose the last letter of the pattern, cause that would result in us having the pattern on the string. When you reach i == n, that means you have a full string without the pattern, so you return 1. The difficulty lays on the fact that when pattern[j] doesn't match with the letter you chose to put on your string, you can't just say that j will be zero.

Example:

Imaging the pattern = ababc, and j = 4, meaning you already matched 'abab' and you know you can't choose 'c' cause that would make the whole pattern appear on the string, but if you choose 'a', now we have 'ababa', and the suffix 'aba' already matches with the three first letters of the pattern, so j will be 3. So you use the kmp algorithm to discover what would me the next j.