omggg's blog

By omggg, history, 5 years ago, In English

Question : https://mirror.codeforces.com/contest/1183/problem/E

I want to know how it can be done using 1.Graphs 2.Dynamic Programming

Also there is a harder version of the problem. [problem:https://mirror.codeforces.com/contest/1183/problem/H] What changes do i need to make for this in my approach?

P.S my rating is 1600. Any help is appreciated Thanks

  • Vote: I like it
  • +8
  • Vote: I do not like it

»
5 years ago, # |
Rev. 3   Vote: I like it 0 Vote: I do not like it

I will only cover the dynamic programming part. It can also be used to solve: https://mirror.codeforces.com/contest/1183/problem/H

Let's define cnt[i][x] as the number of distinct subsequences of length $$$x$$$ by only taking characters from index $$$i$$$ to $$$n-1$$$.

The problem can be solved by greedily taking the longest subsequences first from cnt[0][x] (because cnt[0][x] already contains all subsequences).

To compute cnt[i][x], we can use dp in $$$O(A \cdot n^{2})$$$ time, where A is the alphabet size. Let cnt_help[i][x][c] be the number of distinct subsequences ending with character c and only taking characters from index $$$i$$$ to $$$n-1$$$. Initially, all values are 0, except cnt[n][0] = 1.

The base case is at index n (no characters). Here's the recurrence for n-1..1:

  • Let $$$c$$$ be the character at index i.
  • Iterate for every size j:
    • cnt_help[i][j][c] = cnt[i+1][j-1] Take all subsequences of 1 length shorter and append character c. Now you have all subsequences ending with character c. You can see that every subsequences here will be distinct as long as all subsequences from cnt[i+1][j-1] is distinct.
    • cnt[i][j] = cnt[i+1][j] + (cnt[i+1][j-1] - cnt_help[i+1][j][c]) Add the newly made subsequences to cnt. But why do we have to subtract cnt_help[i+1][j][c]? It's to avoid duplication. With simple observation, you can tell why all the subsequences in cnt_help[i+1][j][c] are duplicates. With this, cnt[i][j] stays distinct

After that, all the answers can be taken from cnt[0][x] as described previously.

You can further memory-optimize this by kinda flying the table. Store only cnt[x] and cnt[x][c], and when iterating for every size j, iterate in decreasing order so that no conflicts happen. This is also easier to code.

By the way, check out my dumb comic >:) https://www.webtoons.com/en/challenge/cp-series/list?title_no=365151

»
5 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Using graphs solution: Let our implicit graph be nodes representing string s and all possible subsequences with edges between all possible transitions, that is between string a and string b where b can be made by removing one character of a. We can't generate this graph explicitly as the total possible number of subsequences is 2^n. The key observation is we need to visit only 100 of these nodes, as k <= 100. So run a BFS with starting node being string s. Go to all strings which can be made by removing only one character. Make sure you only go to distinct strings. If BFS terminates with number of nodes visited being less than k. Answer is -1. Else answer is the total cost computed by BFS.

Sample solution: https://mirror.codeforces.com/contest/1183/submission/74256424

  • »
    »
    5 years ago, # ^ |
    Rev. 3   Vote: I like it +3 Vote: I do not like it

    Your solution is easy to understand. So we basically do greedy thing and keep on generating biggest possible substring. I got it. How to analyse time complexity of such algorithm? Since there can me all different alphabets or maybe duplicates? Till what N , it won't face TLE?

    There's hard version of this question as well. How to deal with it? https://mirror.codeforces.com/contest/1183/problem/H

    • »
      »
      »
      5 years ago, # ^ |
      Rev. 3   Vote: I like it 0 Vote: I do not like it

      When we're looking at one node/string, we take O((n^2)logn)(logn factor for checking set) time to find which subsequences we can go to which aren't visited. The total number of nodes we visit is min(distinct number of subsequences, k). As BFS terminates when we visit k nodes or there are no more subsequences. So time complexity is O(k(n^2)logn)/O(100(n^2)logn). This approach works only as k is very small <= 100. The harder version of the question can't be solved with this approach as k <= 10^12. Which forces you to use the dynamic programming approach explained by malfple.

»
5 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by omggg (previous revision, new revision, compare).