lazysegtree's blog

By lazysegtree, history, 9 months ago, In English

Problem statement

Given a 1-indexed array of 0s and 1s of length $$$n$$$, sort it in $$$O(n)$$$ time with a model of computation that allows only read on the array and reversal operations for write.

A reversal operation is done by choosing a subarray with bounds $$$l$$$ and $$$r$$$ such that $$$1 \le l \le r \le n$$$, and reversing it. A reversal operation takes $$$O(r-l)$$$ time.

Note — The goal is not to achieve $$$O(n)$$$ number of reversals, but total $$$O(n)$$$ time.

Note — We cannot optimize the reversal operation. The model of computation does not allow that. Any reversal on a subarray of size $$$S$$$ will take $$$O(S)$$$. You can think of it in a way that a stupid monkey is doing the reversals, and he will always take $$$O(S)$$$ time for a subarray of size $$$S$$$. You can only tell him $$$l$$$ and $$$r$$$, not how to do the reversal

Where am I on this?

A merge-sort based solution is straightforward, but $$$O(n*logn)$$$

I think it is impossible to solve it in O(n)

I tried to consider an array of repeating $$$0 1$$$, and never found a way to sort it in $$$O(n)$$$

My hand wavy proof for this

Help needed

If there is a way, please explain.

If not, any other intuitions to prove that it's not possible?

  • Vote: I like it
  • +15
  • Vote: I do not like it

»
9 months ago, hide # |
 
Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by lazysegtree (previous revision, new revision, compare).

»
8 months ago, hide # |
 
Vote: I like it 0 Vote: I do not like it

A bump, as a last hope of getting some answer on this. Sadly I dont have more skilled friends than me to discuss anything.

»
8 months ago, hide # |
Rev. 2  
Vote: I like it +3 Vote: I do not like it

How would you do it in $$$O(n \log{n})$$$ with merge sort? Sorry, this is not immediately obvious to me

Edit: Just ran a simulation, from $$$n=6$$$ onwards the worst-case cost appears to be $$$2(n-2)$$$ which is very surprising.

Spoiler
  • »
    »
    8 months ago, hide # ^ |
    Rev. 2  
    Vote: I like it 0 Vote: I do not like it

    $$$O(nlogn)$$$ solution

    Assume that sorting the array of size $$$n$$$ takes $$$T(n)$$$ in worst case

    Divide the array in two and sort the two halves in, $$$T(n/2) + T(n/2)$$$ time. Then you have 0 0 0 ... 1 1 1 0 0 0 . . . 1 1 1

    Now this can be sorted in single reversal of the subarray comprising with all 1 in the first half and all 0 in the second half, which is worst case $$$O(n)$$$ operations

    $$$T(n) = 2 * T(n/2) + O(n)$$$

    Which leads to similar recurrence as the usual merge sort, leading to $$$O(nlogn)$$$ complexity.

    • »
      »
      »
      8 months ago, hide # ^ |
       
      Vote: I like it 0 Vote: I do not like it

      How do you do the combine step? Can you give me an example with, say $$$n=8$$$, $$$a=10101010$$$? I don't think you can combine in $$$O(n)$$$: the cost of an operation is proportional to its length.

      • »
        »
        »
        »
        8 months ago, hide # ^ |
         
        Vote: I like it +3 Vote: I do not like it

        Before combining two arrays, the requirement is to sort them.

        For $$$01010101$$$, first you sort the two halves and get them to $$$0011$$$ $$$0011$$$, and then combine them to $$$00110011$$$ Then apply reverse of the middle four elements and you get $$$00001111$$$

      • »
        »
        »
        »
        8 months ago, hide # ^ |
        Rev. 3  
        Vote: I like it 0 Vote: I do not like it

        since 2 childs combine at the parent and there can be at most 3n parent nodes (all nodes except the last layer) [segtree bound of total 4n nodes] now for each layer i there are i nodes in this layer [layer 1 has 1 node and so on] and the current layer needs i * (n/(i+1) operations since for each of the i nodes, we need to reverse half of its subpart [the mid portion]. so we get 1*n/2 + 2*n/3 + 4*n/4.... [logn terms of this] this is around n^2 (chatgpt proof since im bad at math):

        Let $$$L = \lfloor \log_2 n \rfloor$$$.

        The sum is $$$S = \sum_{i=1}^{L} \frac{2^{i-1} n}{i+1}$$$.

        Define $$$a_i = \frac{2^{i-1} n}{i+1}$$$.

        For $$$1 \le i \le L-1$$$ we have $$$\frac{a_i}{a_{i+1}} = \frac{2^{i-1}/(i+1)}{2^i/(i+2)} = \frac{i+2}{2(i+1)} \le \tfrac{3}{4}$$$.

        Hence $$$a_i \le \tfrac{3}{4} a_{i+1}$$$ and the sequence is geometrically increasing.

        Thus $$$S \le a_L(1 + \tfrac{3}{4} + (\tfrac{3}{4})^2 + \dots) = 4a_L$$$, and clearly $$$S \ge a_L$$$.

        Since $$$2^L \le n \lt 2^{L+1}$$$ we get $$$2^{L-1} \in [n/4, n/2)$$$.

        So $$$a_L = \tfrac{n \cdot 2^{L-1}}{L+1}$$$ satisfies $$$\tfrac{n^2}{4(L+1)} \le a_L \lt \tfrac{n^2}{2(L+1)}$$$.

        Therefore $$$\tfrac{n^2}{4(L+1)} \le S \le \tfrac{2n^2}{L+1}$$$.

        As $$$L+1 = \Theta(\log n)$$$, we conclude $$$S = \Theta(n^2 / \log n)$$$.

  • »
    »
    8 months ago, hide # ^ |
     
    Vote: I like it 0 Vote: I do not like it

    What kind of simulation is this? Some sort of brute force way to get the minimum cost ?

    Its possible that cost could be proportional to something like $$$ C * n^2 $$$, where $$$C$$$ is a bit small ( like $$$0.1$$$ ). Leading to being comparable to $$$O(n)$$$ for smaller numbers.

    Try larger value like $$$100$$$, or $$$50$$$.

    • »
      »
      »
      8 months ago, hide # ^ |
       
      Vote: I like it 0 Vote: I do not like it

      Multi-source Dijkstra's with all sorted states being the 'end nodes', $$$2^n$$$ total nodes representing all the states with weighted edges between two nodes representing the operation. The answer for a given $$$n$$$ is just the maximum smallest distance between any two nodes. Unfortunately I can't test larger numbers due to the exponential nature of this simulation. Yeah, obviously nothing can be said about the actual complexity with this but it's still pretty surprising that it resembles a straight line up to $$$n=14$$$. This doesn't mean a linear time algorithm is achievable, but does hint towards it.

      • »
        »
        »
        »
        8 months ago, hide # ^ |
         
        Vote: I like it 0 Vote: I do not like it

        Could you just simplify your states to only have nodes with count of equal $$$0$$$ and $$$1$$$.

        To only calcuate for arrays of type $$$01010101...$$$

        In that case whats the max number you can test.

»
8 months ago, hide # |
 
Vote: I like it 0 Vote: I do not like it

I'm not sure but something along the lines of bucket sort might help as the array only contains zeros and ones. Avg case time complexity is O(n+k).

Or you could refer to the dutch national flag algorithm, it originally uses three pointers but you could do with two. Time complexity is O(n).

Correct me if i'm wrong though.

»
8 months ago, hide # |
 
Vote: I like it +2 Vote: I do not like it

The best I could think of right now: n*(n+1)/4 time, n/2 reversals for 101010101010101010..... repeat until sort: imagine the 1s are in components i to k take component k-1 and the 0s after it until component k reverse this total subpart n/2 reversals since every 1 component needs 1 reversal (except the last) n*(n+1)/4 since this take o(2) for the first component o(4), o(6) and so on till o(n) this takes lower time if there are lesser 1 components. I think 1010101010..... case cannot be solved in less than approx. o(n^2), please let me know if you find a better solution

  • »
    »
    8 months ago, hide # ^ |
    Rev. 3  
    Vote: I like it +2 Vote: I do not like it
    sort 10101010 in 13 time steps

    Probably possible to write recurrent and get that you can sort $$$1010...$$$ in $$$O(n)$$$ time steps. Oops, I stpd author write exactly this..

»
8 months ago, hide # |
Rev. 2  
Vote: I like it +26 Vote: I do not like it

You can prove that for an array of repeating $$$01$$$ you need $$$\mathcal{O}(n\log{n})$$$.

First lets pair each $$$0$$$ to some $$$1$$$. For our convenience we would pair each $$$0$$$ to the next $$$1$$$(for example in $$$01010101$$$ we would pair $$$01\ 01\ 01\ 01$$$). Now let's define our potential $$$P = \sum\limits_{i = 1}^{\frac{n}{2}}{\log{d_i}}$$$(where $$$d_i$$$ is the distance from i'th $$$0$$$ to it's paired $$$1$$$). Notice that $$$P_{\text{Final}} \geq \frac{n}{4}\log{\frac{n}{4}}$$$(just consider first $$$\frac{n}{4}$$$ zeros) and $$$P_{\text{Start}} = 0$$$.

Lets prove that by reversing a subarray of length $$$l$$$ we can't increase the potential $$$P$$$ by more than $$$\mathcal{O}(l)$$$. We will just care about the potential changes connected to moving zeros from the right side of the subarray $$$l$$$. All increased potential would be no more than four times the upper bound we will establish. Let's denote by $$$a_i$$$ the distance between the $$$1$$$ outside the subarray bound connected to the i'th $$$0$$$ we care about inside the subarray and the subarray bound closer to this $$$1$$$. We will assume that all places there are occupied by zeros. Now lets see how the potential will change for i'th zero. $$$\log(i+a_i) \rightarrow \log(l-i+a_i)$$$ Lets notice that if $$$a_i \geq l$$$ than $$$\Delta_i(P) \leq 1$$$. Summing all of them up gives $$$\Delta(P) \leq l$$$, so we could assume that all $$$a_i \leq l$$$. Knowing this we can calculate the total delta of potential:

$$$\Delta_i(P)=\log{\frac{l-i+a_i}{i+a_i}} \leq \log{\frac{l+a_i}{i+a_i}} \leq \log{\frac{2l}{i+a_i}} \leq \log{\frac{2l}{i}}$$$

We can approximate sum of this with integrals. $$$\int_{1}^{l} \log{\frac{2l}{i}} di = l\cdot (1+\ln(2)) - \ln(2l) - 1 = \mathcal{O}(l)$$$.

We've proven that $$$\Delta_{\text{Total}}(P) = \mathcal{O}(n\log{n})$$$ and $$$\Delta_l(P) \leq \mathcal{O}(l)$$$. Now it's clear that we can't sort an array this way faster than $$$\mathcal{O}(n\log{n})$$$.

»
8 months ago, hide # |
 
Vote: I like it 0 Vote: I do not like it

If I remember correctly, it is a well-known result that any sorting network that sorts all binary arrays is also valid for all normal arrays.

However, I'm not really sure if the usual definition of a "sorting network" applies here, and I couldn't find a source for this claim either, so I'd be happy if there is anyone that could shed more light onto this.

  • »
    »
    8 months ago, hide # ^ |
     
    Vote: I like it +8 Vote: I do not like it

    That's true but I don't think that this algorithm is a representable sorting network since it's some kind of a "dynamic" version. You compare depending on the previous outputs which is not allowed in sorting networks. The most convincing thing I could say is that sorting network for binary arrays are just working for non-binary ones — you don't have to change anything.

    I don't know about $$$\mathcal{O}(n\log{n})$$$ solution, but there is a $$$\mathcal{O}(n\log^2{n})$$$ one. Just treat the smaller half of the array as zeros and the other half as ones. Sort like for a binary alphabet. Now you've partitioned your array into smaller element in the left half and greater elements in the right half. You can just recurse to both halves and achieve an $$$\mathcal{O}(n\log^2{n})$$$ algorithm.