Sorting an array of 0 and 1s with reversal operations in linear time

#	User	Rating
1	Benq	3792
2	VivaciousAubergine	3647
3	Kevin114514	3611
4	jiangly	3583
5	strapple	3515
6	tourist	3470
7	Radewoosh	3415
8	Um_nik	3376
9	maroonrk	3361
10	XVIII	3345

#	User	Contrib.
1	Qingyu	162
2	adamant	148
3	Um_nik	146
4	Dominater069	143
5	errorgorn	141
6	cry	138
7	Proof_by_QED	136
8	YuukiS	135
9	chromate00	134
10	soullless	133

Problem statement

Given a 1-indexed array of 0s and 1s of length $$$n$$$, sort it in $$$O(n)$$$ time with a model of computation that allows only read on the array and reversal operations for write.

A reversal operation is done by choosing a subarray with bounds $$$l$$$ and $$$r$$$ such that $$$1 \le l \le r \le n$$$, and reversing it. A reversal operation takes $$$O(r-l)$$$ time.

Note — The goal is not to achieve $$$O(n)$$$ number of reversals, but total $$$O(n)$$$ time.

Note — We cannot optimize the reversal operation. The model of computation does not allow that. Any reversal on a subarray of size $$$S$$$ will take $$$O(S)$$$. You can think of it in a way that a stupid monkey is doing the reversals, and he will always take $$$O(S)$$$ time for a subarray of size $$$S$$$. You can only tell him $$$l$$$ and $$$r$$$, not how to do the reversal

Where am I on this?

A merge-sort based solution is straightforward, but $$$O(n*logn)$$$

I think it is impossible to solve it in O(n)

I tried to consider an array of repeating $$$0 1$$$, and never found a way to sort it in $$$O(n)$$$

My hand wavy proof for this

Assume for the sake of contradiction that there exists an algorithm that can solve the problem for every array of size $$$n$$$ in $$$n*C$$$ steps. Where $$$C$$$ is a constant.

Consider this array of repeating $$$0 1$$$ of even size $$$n$$$. Example: $$$0 1 0 1 0 1 0 1 0 1$$$

Split it to two parts 0 1 .... 0 1 0 1 0 1 | 0 1 0 1 0 .... 0 1

Now the 1s in the left are to be moved to the right, and 1s in the right should not go to the left.

This part is hard to prove (I will attempt to get a more sophisticated and clearer arguements), but to solve it optimally, you must first sort both halves and get the array to this state

0 0 0 ... 1 1 1 | 0 0 0 .... 1 1 1

If you dont get it this way, any reversal operation you do will with a subarray will take unwanted elements with it...

So you must take $$$O(n/2 * C)$$$ + $$$O(n/2 * C)$$$ steps to reach here, and then $$$O(n/2)$$$ steps to do another reversal Making total steps $$$O(n*(C + 0.5))$$$

Help needed

If there is a way, please explain.

If not, any other intuitions to prove that it's not possible?

Comments (21)

Write comment?

lazysegtree

9 months ago, hide # |

Auto comment: topic has been updated by lazysegtree (previous revision, new revision, compare).

→ Reply

8 months ago, hide # |

A bump, as a last hope of getting some answer on this. Sadly I dont have more skilled friends than me to discuss anything.

avighnakc

← Rev. 2 →

How would you do it in $$$O(n \log{n})$$$ with merge sort? Sorry, this is not immediately obvious to me

Edit: Just ran a simulation, from $$$n=6$$$ onwards the worst-case cost appears to be $$$2(n-2)$$$ which is very surprising.

Spoiler

8 months ago, hide # ^ |

$$$O(nlogn)$$$ solution

Assume that sorting the array of size $$$n$$$ takes $$$T(n)$$$ in worst case

Divide the array in two and sort the two halves in, $$$T(n/2) + T(n/2)$$$ time. Then you have 0 0 0 ... 1 1 1 0 0 0 . . . 1 1 1

Now this can be sorted in single reversal of the subarray comprising with all 1 in the first half and all 0 in the second half, which is worst case $$$O(n)$$$ operations

$$$T(n) = 2 * T(n/2) + O(n)$$$

Which leads to similar recurrence as the usual merge sort, leading to $$$O(nlogn)$$$ complexity.

How do you do the combine step? Can you give me an example with, say $$$n=8$$$, $$$a=10101010$$$? I don't think you can combine in $$$O(n)$$$: the cost of an operation is proportional to its length.

Before combining two arrays, the requirement is to sort them.

For $$$01010101$$$, first you sort the two halves and get them to $$$0011$$$ $$$0011$$$, and then combine them to $$$00110011$$$ Then apply reverse of the middle four elements and you get $$$00001111$$$

Oh okay, of course. I don’t know how I missed that lol

am_aadvik

← Rev. 3 →

since 2 childs combine at the parent and there can be at most 3n parent nodes (all nodes except the last layer) [segtree bound of total 4n nodes] now for each layer i there are i nodes in this layer [layer 1 has 1 node and so on] and the current layer needs i * (n/(i+1) operations since for each of the i nodes, we need to reverse half of its subpart [the mid portion]. so we get 1*n/2 + 2*n/3 + 4*n/4.... [logn terms of this] this is around n^2 (chatgpt proof since im bad at math):

Let $$$L = \lfloor \log_2 n \rfloor$$$.

The sum is $$$S = \sum_{i=1}^{L} \frac{2^{i-1} n}{i+1}$$$.

Define $$$a_i = \frac{2^{i-1} n}{i+1}$$$.

For $$$1 \le i \le L-1$$$ we have $$$\frac{a_i}{a_{i+1}} = \frac{2^{i-1}/(i+1)}{2^i/(i+2)} = \frac{i+2}{2(i+1)} \le \tfrac{3}{4}$$$.

Hence $$$a_i \le \tfrac{3}{4} a_{i+1}$$$ and the sequence is geometrically increasing.

Thus $$$S \le a_L(1 + \tfrac{3}{4} + (\tfrac{3}{4})^2 + \dots) = 4a_L$$$, and clearly $$$S \ge a_L$$$.

Since $$$2^L \le n \lt 2^{L+1}$$$ we get $$$2^{L-1} \in [n/4, n/2)$$$.

So $$$a_L = \tfrac{n \cdot 2^{L-1}}{L+1}$$$ satisfies $$$\tfrac{n^2}{4(L+1)} \le a_L \lt \tfrac{n^2}{2(L+1)}$$$.

Therefore $$$\tfrac{n^2}{4(L+1)} \le S \le \tfrac{2n^2}{L+1}$$$.

As $$$L+1 = \Theta(\log n)$$$, we conclude $$$S = \Theta(n^2 / \log n)$$$.

i_squared

There are 2^i nodes on layer i, not i imo, which leads to n/2 work at each level and n/2logn complexity

You're right

What kind of simulation is this? Some sort of brute force way to get the minimum cost ?

Its possible that cost could be proportional to something like $$$ C * n^2 $$$, where $$$C$$$ is a bit small ( like $$$0.1$$$ ). Leading to being comparable to $$$O(n)$$$ for smaller numbers.

Try larger value like $$$100$$$, or $$$50$$$.

Multi-source Dijkstra's with all sorted states being the 'end nodes', $$$2^n$$$ total nodes representing all the states with weighted edges between two nodes representing the operation. The answer for a given $$$n$$$ is just the maximum smallest distance between any two nodes. Unfortunately I can't test larger numbers due to the exponential nature of this simulation. Yeah, obviously nothing can be said about the actual complexity with this but it's still pretty surprising that it resembles a straight line up to $$$n=14$$$. This doesn't mean a linear time algorithm is achievable, but does hint towards it.

Could you just simplify your states to only have nodes with count of equal $$$0$$$ and $$$1$$$.

To only calcuate for arrays of type $$$01010101...$$$

In that case whats the max number you can test.

gold_flake_enjoyer

I'm not sure but something along the lines of bucket sort might help as the array only contains zeros and ones. Avg case time complexity is O(n+k).

Or you could refer to the dutch national flag algorithm, it originally uses three pointers but you could do with two. Time complexity is O(n).

Correct me if i'm wrong though.

Both bucket sort and dutch national flag algorithm require swapping which is not allowed here.

The best I could think of right now: n*(n+1)/4 time, n/2 reversals for 101010101010101010..... repeat until sort: imagine the 1s are in components i to k take component k-1 and the 0s after it until component k reverse this total subpart n/2 reversals since every 1 component needs 1 reversal (except the last) n*(n+1)/4 since this take o(2) for the first component o(4), o(6) and so on till o(n) this takes lower time if there are lesser 1 components. I think 1010101010..... case cannot be solved in less than approx. o(n^2), please let me know if you find a better solution

remyawka

sort 10101010 in 13 time steps

[10]101010 -> 01101010
0[110]1010 -> 00111010
00111[01]0 -> 00111100
00[111100] -> 00001111

Probably possible to write recurrent and get that you can sort $$$1010...$$$ in $$$O(n)$$$ time steps. Oops, I stpd author write exactly this..

Yeah, this is nlogn

Maxi135798642

+26

You can prove that for an array of repeating $$$01$$$ you need $$$\mathcal{O}(n\log{n})$$$.

First lets pair each $$$0$$$ to some $$$1$$$. For our convenience we would pair each $$$0$$$ to the next $$$1$$$(for example in $$$01010101$$$ we would pair $$$01\ 01\ 01\ 01$$$). Now let's define our potential $$$P = \sum\limits_{i = 1}^{\frac{n}{2}}{\log{d_i}}$$$(where $$$d_i$$$ is the distance from i'th $$$0$$$ to it's paired $$$1$$$). Notice that $$$P_{\text{Final}} \geq \frac{n}{4}\log{\frac{n}{4}}$$$(just consider first $$$\frac{n}{4}$$$ zeros) and $$$P_{\text{Start}} = 0$$$.

Lets prove that by reversing a subarray of length $$$l$$$ we can't increase the potential $$$P$$$ by more than $$$\mathcal{O}(l)$$$. We will just care about the potential changes connected to moving zeros from the right side of the subarray $$$l$$$. All increased potential would be no more than four times the upper bound we will establish. Let's denote by $$$a_i$$$ the distance between the $$$1$$$ outside the subarray bound connected to the i'th $$$0$$$ we care about inside the subarray and the subarray bound closer to this $$$1$$$. We will assume that all places there are occupied by zeros. Now lets see how the potential will change for i'th zero. $$$\log(i+a_i) \rightarrow \log(l-i+a_i)$$$ Lets notice that if $$$a_i \geq l$$$ than $$$\Delta_i(P) \leq 1$$$. Summing all of them up gives $$$\Delta(P) \leq l$$$, so we could assume that all $$$a_i \leq l$$$. Knowing this we can calculate the total delta of potential:

$$$\Delta_i(P)=\log{\frac{l-i+a_i}{i+a_i}} \leq \log{\frac{l+a_i}{i+a_i}} \leq \log{\frac{2l}{i+a_i}} \leq \log{\frac{2l}{i}}$$$

We can approximate sum of this with integrals. $$$\int_{1}^{l} \log{\frac{2l}{i}} di = l\cdot (1+\ln(2)) - \ln(2l) - 1 = \mathcal{O}(l)$$$.

We've proven that $$$\Delta_{\text{Total}}(P) = \mathcal{O}(n\log{n})$$$ and $$$\Delta_l(P) \leq \mathcal{O}(l)$$$. Now it's clear that we can't sort an array this way faster than $$$\mathcal{O}(n\log{n})$$$.

https

If I remember correctly, it is a well-known result that any sorting network that sorts all binary arrays is also valid for all normal arrays.

However, I'm not really sure if the usual definition of a "sorting network" applies here, and I couldn't find a source for this claim either, so I'd be happy if there is anyone that could shed more light onto this.

That's true but I don't think that this algorithm is a representable sorting network since it's some kind of a "dynamic" version. You compare depending on the previous outputs which is not allowed in sorting networks. The most convincing thing I could say is that sorting network for binary arrays are just working for non-binary ones — you don't have to change anything.

I don't know about $$$\mathcal{O}(n\log{n})$$$ solution, but there is a $$$\mathcal{O}(n\log^2{n})$$$ one. Just treat the smaller half of the array as zeros and the other half as ones. Sort like for a binary alphabet. Now you've partitioned your array into smaller element in the left half and greater elements in the right half. You can just recurse to both halves and achieve an $$$\mathcal{O}(n\log^2{n})$$$ algorithm.

lazysegtree's blog

Problem statement

Where am I on this?

Help needed