Блог пользователя k790alex

Автор k790alex, 11 лет назад, По-английски

Hello, I'm trying to solve this problem (link).

A brief description:

Given two strings (s, t) and Q queries, each query contains an integer K, you should count how many substrings of s can be equal to t with changing exactly K characters of s.

constrains:
|s| <= 10^6
|s| > |t|
Q <= 10^5
characters can be 'x' or 'y' only

I got a solution using suffix tree which runs in linear time but my problem is that it uses too much memory.

I tried to reduce it to suffix array but I couldn't.

As the constrains are high I guess I need a linear time solution, can you see how to do it using less memory?

Thanks.

  • Проголосовать: нравится
  • +8
  • Проголосовать: не нравится

»
11 лет назад, скрыть # |
Rev. 2  
Проголосовать: нравится +21 Проголосовать: не нравится

How did you solved this problem with suffix tree?

If we calculate hamming distance of small string with large string for every shifts, queries can be answered easily.

We can calculate those hamming distances with FFT. Overall complexity is O(NlogN)

  • »
    »
    11 лет назад, скрыть # ^ |
     
    Проголосовать: нравится 0 Проголосовать: не нравится

    Thanks but I didn't know how to compute these using FFT, can you explain a little?

    I'll try to describe my approach, let s = "xxyxx" and t = "xy", if I build the suffix tree for s (appending '$' as delimiter), I can traverse the tree with t counting the necessary changes to match them (see the picture).

    Each node have and id and the string contained in it, there are texts explaining the results, with this I can build an array containing how many substrings can be equal to t with 0 changes, 1 changes, .. , |t| changes.

    • »
      »
      »
      11 лет назад, скрыть # ^ |
       
      Проголосовать: нравится 0 Проголосовать: не нравится

      How are you calculating number of mismatches? As you know suffix tree is a compressed trie that's why there might be a node that holding information about thousands of characters. Are you able to do that step in constant time?

      • »
        »
        »
        »
        11 лет назад, скрыть # ^ |
         
        Проголосовать: нравится 0 Проголосовать: не нравится

        Yes, its a simple dfs like:

        // node is the current node of the tree the dfs is visiting
        // index the current index of t
        // mismatches is the number of mismatches I had found to get into the node
        void dfs(node, index, mismatches) {
         if ( index == |t| ) {
          // means i traverse all t
          // count(node) = number of leafs of subtree rooted at node ( O(1) after preprocessing)
          // ans[x] = number of substrings of s that can be matched to t changing exactly x chars
          ans[mismatches] += count(node);
          ret;
         }
         // node contains a substring of s
         // node.begin is start index (inclusive)
         // node.end is end index (exclusive)
         for (i = node.begin; i < node.end && index < |t|; i++, index++) {
         // traverse substring at node until we reach end of t or end of node
          if ( s[i] == '$' )
           ret; // a delimiter means node has not enough chars to match t
          // count mismatches while traversing substring at node
          mismatched += s[i] == t[index] ? 1 : 0;
         }
        
         // traverse children
         for each children of node called next
          dfs(next, index, mismatched);
        
         // i had traversed the whole t
         if ( node is a leaf )
          ans[mismatched]++;
        }
        
    • »
      »
      »
      11 лет назад, скрыть # ^ |
       
      Проголосовать: нравится 0 Проголосовать: не нравится

      To calculate hamming distance with fft, one can create two polynomials where coficients is +1 for X and -1 for Y. Notice that if two characters is not same they will be counted as -1.