Given an $$$1$$$-indexed integer array $$$a=[a_1,\,a_2,\,a_3,...,\,a_n]$$$ and a fixed windows size $$$k$$$,define sliding window $$$I_{j,\,(j \geq k)} := [a_{j-k+1}, a_{j-k+2}, ..., a_j]$$$. For each $$$j \geq k$$$, we need to answer a query:
How many sliding windows in $$$\{I_k, I_{k+1}, ..., I_{j-1}\}$$$ are less or equal than $$$I_j$$$ in the alphabetical order?
For example, $$$a=[1, 2, 1, 3]$$$ and $$$k=2$$$:
(1) For $$$j=2$$$, we should answer $$$0$$$.
(2) For $$$j=3$$$, we should answer $$$1$$$ as $$$[1,2] < [2,1]$$$ in alphabetical order.
(3) For $$$j=4$$$, we should answer $$$1$$$ as $$$[1,2] < [1,3]$$$ in alphabetical order.
Suffix array + LCP array can solve it in $$$O(nlogn)$$$ offline. But how about solving online? For example, what if $$$a$$$ is an unbounded datastream instead of an array? In the datastream case, you have to process $$$a_j$$$ and $$$I_j$$$ before reading $$$a_{j+1}$$$.
With hashing+order statistics you can solve it in $$$O(n\log(n)\log(k))$$$. The order statistics tree gives you the position in the sorted sequence of all $$$I_j$$$ in $$$O(n\log(n)\cdot\texttt{cmp})$$$, where $$$\texttt{cmp}$$$ is the time needed to compare two sequences. With hashing you can compare two sequences in $$$O(\log(k))$$$ by binary searching the longest common prefix and then comparing the next index.
But here is a problem: The hash value of a window might be exponentially large, and you may need $$$O(k)$$$ time to maintain such a large number.
Hash functions are surjective. There's no point in making it bijective for the reasons of performance. As a downside — collisions are possible. That's why you should minimize the chance of a collision (random moduli or base).
But how can we compare windows after moduli?
link1 (cf)
link2 (cp-algorithms)
Thanks!