i_love_sqrt_decomp's blog

By i_love_sqrt_decomp, history, 10 months ago, In English

I want to ask this problem: There are $$$Q$$$ queries and 2 strings $$$S,T$$$ of length $$$N$$$, each containing $$$l_1,l_2,r_1,r_2$$$, which asks for wildcard matching on the substrings $$$S[l_1...r_1]$$$ and $$$T[l_2...r_2]$$$ (Assume $$$r_1-l_1=r_2-l_2$$$). Can it be done faster than $$$O(NQ)$$$?

»
10 months ago, hide # |
 
Vote: I like it +3 Vote: I do not like it

What exactly do you mean by wildcard pattern matching?

I'm assuming the query asks whether the substrings are equal. Then, you can use simple polynomial hashing which takes $$$O(n)$$$ for preprocessing and each query can be answered in $$$O(1)$$$.

»
9 months ago, hide # |
 
Vote: I like it +1 Vote: I do not like it

Update: using sqrt decomposition and FFT, it can be solved in $$$O((n+q)\sqrt{n\log(n)})$$$ in the same way as 472G - Design Tutorial: Increase the Constraints.

  • »
    »
    9 months ago, hide # ^ |
     
    Vote: I like it +6 Vote: I do not like it

    I believe with a bit smarter choice of block size you can get this complexity down to $$$\mathcal{O}(q\sqrt{n}+n\sqrt{q\log(n)})$$$. It's better when $$$q$$$ isn't $$$\Theta(n)$$$.

    • »
      »
      »
      9 months ago, hide # ^ |
       
      Vote: I like it +1 Vote: I do not like it

      Shouldn't the $$$\log(n)$$$ be outside the square root sign?

      • »
        »
        »
        »
        9 months ago, hide # ^ |
         
        Vote: I like it 0 Vote: I do not like it

        Why would you think that? For $$$n=q$$$ it's equal to yours. If $$$\log(n)$$$ would be outside, it would be worse.

        • »
          »
          »
          »
          »
          9 months ago, hide # ^ |
          Rev. 4  
          Vote: I like it +4 Vote: I do not like it

          Wasn't the complexity $$$O(q(B+\frac{n}{B})+\frac{n^2\log(n)}{B})$$$? Taking the derivative wrt B, then letting it equal to 0 results in $$$B=\sqrt{\frac{n^2\log(n)}{q}+n}$$$. Assuming $$$q \simeq n$$$, $$$B=O(\sqrt{n\log(n)})$$$, getting a complexity of $$$O((n+q)\sqrt{n\log(n)})$$$.

          Edit: This seems to be the same as the TC above. We use the identity $$$\sqrt{a+b}\leq \sqrt{a}+\sqrt{b}$$$. Then the expression shown above is $$$O(q\sqrt{n}+n\sqrt{q\log(n)}+q\sqrt{n}+n\sqrt{q\log(n)})$$$. (The 3rd term use the fact that $$$B \gt \sqrt{n}$$$). So, you were right.

          • »
            »
            »
            »
            »
            »
            9 months ago, hide # ^ |
             
            Vote: I like it +10 Vote: I do not like it

            orz

          • »
            »
            »
            »
            »
            »
            9 months ago, hide # ^ |
             
            Vote: I like it 0 Vote: I do not like it

            A second ago while reading your comment I saw why you thought that $$$log(n)$$$ should be outside the sqrt(You have to perform at least one FFT computation on the whole sequence). Indeed for $$$q \simeq 1$$$ the block size would be $$$\mathcal{O}(n\sqrt{\log{n}})$$$. We clearly can't divide the sequence into blocks of such size. This means that our block size is invalid for $$$q \lesssim \log{n}$$$. Fortunately for such $$$q$$$ we achieve not worse complexity via brute force approach, so we wouldn't use this algorithm. So true complexity should be $$$\mathcal{O}(q\sqrt{n}+n\sqrt{q\log{n}}+n\log{n})$$$, or $$$\mathcal{O}(q\sqrt{n}+n\cdot\min(\sqrt{q\log{n}}, q))$$$ if we allow using brute force for small $$$q$$$.