I am thinking about an effient algorithm for wildcard searching with * representing any characters with any length.
aa*c, she*he, *she*he
Example:
caa find aa*c
return "DOES NOT EXIST"
hesherheshe find she*he return 2
Because sherheshe begins at index 2
- sherheshe find she*he return 0 Because the whole string When I am supposed to return, say, the beginning index of the first matching instance.
Say the pattern is of length M and the document is of length N, and the pattern has K '*' signs. I can think of a solution that first uses AC Automation to find all occurences of each chunk in O(N + M), with bitmask.
While converting the bitmask to indexes takes O(N * K)
Then binary search for the last possible beginning positions for each chunk. This could take O (K log N)
So the overall time complexity is still O (N * K), any way to do better?
References
https://mirror.codeforces.com/blog/entry/111380