Jrke's blog

By Jrke, history, 6 months ago, In English

Ok, so first of all i don't know if it exist. But i will share my algorithm here only.

MOTIVATION!

Radix sort with base > 10?

so my motivation came when i though of why can't we use base 1e5 or 1e6 in radix sort? and i came up with following idea.

Most of the questions on CF have limit of values <=1e9. So my algorithm focuses majorily on that thing.

Alogirthm.

let us define a constant base, now we will make groups on the basis of ai % base,

  • group[j] -> contains ai / base of those whoose ai % base == j.
  • counts[j] -> it will store total number of times ai / base is equal to j. Note that both arrays will be of length base.

Next question — Now, how can we find the index(in sorted array) of any value of any group?

Observe, let x be any value of group[i], then value = x * base + i, its index will be atleast bigger than prefix Sum of counts[i] upto x — 1, also value will be bigger than those value whoose group number is smaller with same x.

Final touch

  1. After making groups, and counts make prefix array of counts(let us call that array pref).
  2. Define another array Live of length base, (initate all elements with 0).
  3. Iterate from group 0 -> base. Now for every element(x) in this group(i) do the following
    • value = x * base + i
    • the index of this value = pref[x — 1] + live[x]
    • now do live[x] += 1
    • Update array[index] = value.
  4. the sorting is done!

Implementation!

void sortit(vector<int> &arr, int base = 1 << 15) {
    int n = arr.size();
    
    vector<vector<int>> grps(base);
    vector<int> cnts(base + 1), live(base);
    for (int i = 0; i < n; i++) {
        int z = arr[i] / base;
        grps[arr[i] % base].push_back(z);//adding each element in respective group
        cnts[z + 1]++;//increasing count
    }
    for (int i = 1; i < base; i++) cnts[i] += cnts[i - 1]; //making prefix array
    
    for (int i = 0; i < base; i++) {
        for (int &x : grps[i]) arr[cnts[x] + live[x]++] = x * base + i;
    }
}

Analysis.

For all values less than base^2(base = 2^15), the time complexity comes out to be O(n + base), 2 * n(2 for loops of n) + base(memory definition and prefix array). you can choose base depending on max value of array, N and memory.

For n > base it starts becoming faster and faster than cpp sort.

Results.

These results are when i ran them on my laptop. UPD- Added radix sort comparison also!

Size 1000 Sorting....

Time taken by sortit() -> 1ms
Time taken  by cpp sort() -> 0ms
Time taken by radix sort(base 256)(4 rounds) -> 0ms
Time taken by radix sort(base 1 << 16)(2 rounds) -> 4ms

Size 10000 Sorting....

Time taken by sortit() -> 1ms
Time taken  by cpp sort() -> 1ms
Time taken by radix sort(base 256)(4 rounds) -> 1ms
Time taken by radix sort(base 1 << 16)(2 rounds) -> 5ms

Size 100000 Sorting....

Time taken by sortit() -> 9ms
Time taken  by cpp sort() -> 15ms
Time taken by radix sort(base 256)(4 rounds) -> 5ms
Time taken by radix sort(base 1 << 16)(2 rounds) -> 18ms

Size 1000000 Sorting....

Time taken by sortit() -> 40ms
Time taken  by cpp sort() -> 185ms
Time taken by radix sort(base 256)(4 rounds) -> 54ms
Time taken by radix sort(base 1 << 16)(2 rounds) -> 90ms

Size 10000000 Sorting....

Time taken by sortit() -> 316ms
Time taken  by cpp sort() -> 2171ms
Time taken by radix sort(base 256)(4 rounds) -> 572ms
Time taken by radix sort(base 1 << 16)(2 rounds) -> 508ms

Size 100000000 Sorting....

Time taken by sortit() -> 3056ms
Time taken  by cpp sort() -> 24149ms
Time taken by radix sort(base 256)(4 rounds) -> 5747ms
Time taken by radix sort(base 1 << 16)(2 rounds) -> 4749ms

Size 1000000000 Sorting....

Time taken by sortit() -> 35071ms
Time taken  by cpp sort() -> 277987ms
Time taken by radix sort(base 256)(4 rounds) -> 65696ms
Time taken by radix sort(base 1 << 16)(2 rounds) -> 50732ms

For higher values of n, it tends to become faster. Hence, this is suitable for higher n values.

If you find any similar or same algorithm, please let me know!

  • Vote: I like it
  • +129
  • Vote: I do not like it

»
6 months ago, hide # |
 
Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by Jrke (previous revision, new revision, compare).

»
6 months ago, hide # |
Rev. 2  
Vote: I like it +4 Vote: I do not like it

std::sort() after this blog: bro what did I do :(

»
6 months ago, hide # |
Rev. 2  
Vote: I like it +11 Vote: I do not like it

https://www.youtube.com/watch?v=Y95a-8oNqps a little late bro.. but keep up the good work!!

»
6 months ago, hide # |
 
Vote: I like it +5 Vote: I do not like it

Cool, but usually sort is enough

»
6 months ago, hide # |
 
Vote: I like it 0 Vote: I do not like it

will it work with non positive integers?

  • »
    »
    6 months ago, hide # ^ |
     
    Vote: I like it +9 Vote: I do not like it

    if you want, it can. You just have to define negative indices in an array, which you can do by defining the middle element as zero or something. Or you can just sort negatives and positives separately, as you are just adding N operations doing that anyways

»
6 months ago, hide # |
 
Vote: I like it 0 Vote: I do not like it

logn never hurt nobody

»
6 months ago, hide # |
 
Vote: I like it 0 Vote: I do not like it

I don't want to hurt you,but the algorithm is already exist for a long time(also called radix sort).It has been used to get SA.

  • »
    »
    6 months ago, hide # ^ |
     
    Vote: I like it 0 Vote: I do not like it

    I don’t think it’s Radix Sort. It’s more like he’s placing values at their corresponding indices, but in an optimized way.He needs roughly √(max(val)) of storage. So I’d say it’s closer to square root decomposition.

»
6 months ago, hide # |
Rev. 4  
Vote: I like it +5 Vote: I do not like it

There are three problems if we want to use it on Cf:

  1. If the base is such that value / base >= base, it will fail. So we always need to fix base >= 2^15 for any value (0 to 1e9).
  2. If the base is fixed, it will always take contest time, so most Codeforces problems will give TLE because of multiple test cases.
  3. It’s also not going to work for negative numbers, but it can be fixed by splitting positive and negative numbers.

The good thing is that it doesn’t depend on the size of the array, it depends on the maximum value.

EDIT: Also, you can check out the video:FASTEST sorting. Also you can also learn more about Tim Sort.

  • »
    »
    6 months ago, hide # ^ |
     
    Vote: I like it 0 Vote: I do not like it

    for bigger n this is better i think, also for negative numbers you can add 1e9 in every element first while making grps, and then later subtract 1e9 from each value while placing values in array.

    • »
      »
      »
      6 months ago, hide # ^ |
       
      Vote: I like it 0 Vote: I do not like it

      If you add 1e9 beforehand, you’ll need a base of √(2×1e9). Also, it’s only efficient for a single test case, not for multiple test cases. You could also say the time complexity is √(max × 2), not O(n).

»
6 months ago, hide # |
Rev. 4  
Vote: I like it +75 Vote: I do not like it

You seemed to have provided a new radix sort implementation that seeming saves 1 array sweep, but additionally costs in using vectors and growing them.

Your benchmark should be against the standard base 256 (4 cycles) or base 1<<11 (3 cycles) 1<<16 radix sort (2 cycles). It is clear that radix sort is faster than standard sort by a lot on large arrays. I fail to see the vector creation and capacity expansion are faster than just doing a radix sort normally with 2 cycles, and empirically 4 cycles seem to do better than 2 cycles anyways.

In summary, please prove it is superior to radix sort base 256 and 1<<16, but I don’t think this is the case.

  • »
    »
    6 months ago, hide # ^ |
     
    Vote: I like it +17 Vote: I do not like it

    with this implementation of radix sort.

    void radixSort(vector<int> &arr, int base = 1 << 8, int rounds = 4) {
        int div = 1;
        for (int round = 0; round < rounds; round++) {
            vector<vector<int>> grps(base);
            for (int &x : arr) grps[(x / div) % base].push_back(x);
            int idx = 0;
            for (int i = 0; i < base; i++) {
                for (int &x : grps[i]) arr[idx++] = x;
            }
            div *= base;
        }
    }
    

    I got these outputs —

    Size 1000 Sorting....

    Time taken by sortit() -> 1ms
    Time taken  by cpp sort() -> 0ms
    Time taken by radix sort(base 256)(4 rounds) -> 0ms
    Time taken by radix sort(base 1 << 16)(2 rounds) -> 4ms
    

    Size 10000 Sorting....

    Time taken by sortit() -> 1ms
    Time taken  by cpp sort() -> 1ms
    Time taken by radix sort(base 256)(4 rounds) -> 1ms
    Time taken by radix sort(base 1 << 16)(2 rounds) -> 5ms
    

    Size 100000 Sorting....

    Time taken by sortit() -> 9ms
    Time taken  by cpp sort() -> 15ms
    Time taken by radix sort(base 256)(4 rounds) -> 5ms
    Time taken by radix sort(base 1 << 16)(2 rounds) -> 18ms
    

    Size 1000000 Sorting....

    Time taken by sortit() -> 40ms
    Time taken  by cpp sort() -> 185ms
    Time taken by radix sort(base 256)(4 rounds) -> 54ms
    Time taken by radix sort(base 1 << 16)(2 rounds) -> 90ms
    

    Size 10000000 Sorting....

    Time taken by sortit() -> 316ms
    Time taken  by cpp sort() -> 2171ms
    Time taken by radix sort(base 256)(4 rounds) -> 572ms
    Time taken by radix sort(base 1 << 16)(2 rounds) -> 508ms
    

    Size 100000000 Sorting....

    Time taken by sortit() -> 3056ms
    Time taken  by cpp sort() -> 24149ms
    Time taken by radix sort(base 256)(4 rounds) -> 5747ms
    Time taken by radix sort(base 1 << 16)(2 rounds) -> 4749ms
    
    • »
      »
      »
      6 months ago, hide # ^ |
       
      Vote: I like it +32 Vote: I do not like it

      Wrong implementation, radix sort can be implemented without growing vectors. Also for both of your implementations, replace modulo/ division with appropriate bit AND/ bit shift.

      in each round: count up (x % base), do one prefix sum on the bin size, then directly write to the second array.

»
6 months ago, hide # |
Rev. 2  
Vote: I like it +17 Vote: I do not like it

Well, radix sort can be even faster.

There's a hard problem in Chinese: https://www.luogu.com.cn/problem/P4604.

Its first task is to sort $$$2\times 10^8$$$ unsigned int in 6s, and even need loop unrolling and other tricks to pass it...

You can try it if you want :)

»
6 months ago, hide # |
 
Vote: I like it 0 Vote: I do not like it

This is amazing! One issue is that $$$O(n+\sqrt{max(a_i)})$$$ is very hackable: since many CF problems have multiple testcases, you can just spam [1e9] a bunch of times and cause TLE

»
6 months ago, hide # |
 
Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by Jrke (previous revision, new revision, compare).

»
6 months ago, hide # |
 
Vote: I like it +16 Vote: I do not like it
  • »
    »
    6 months ago, hide # ^ |
     
    Vote: I like it +3 Vote: I do not like it

    I just copy pasted this code till line 72 didn't changed anything. I also optimized my code as said by arvindf232, regarding bit wise AND and bit shifts thanks!

    These were the time taken —

    Size 1000 Sorting....

    Time taken by sortit() -> 1ms
    Time taken by radix sort -> 0ms
    

    Size 10000 Sorting....

    Time taken by sortit() -> 2ms
    Time taken by radix sort -> 0ms
    

    Size 100000 Sorting....

    Time taken by sortit() -> 9ms
    Time taken by radix sort -> 3ms
    

    Size 1000000 Sorting....

    Time taken by sortit() -> 40ms
    Time taken by radix sort -> 40ms
    

    Size 10000000 Sorting....

    Time taken by sortit() -> 291ms
    Time taken by radix sort -> 403ms
    

    Size 100000000 Sorting....

    Time taken by sortit() -> 2870ms
    Time taken by radix sort -> 4234ms
    

    Size 1000000000 Sorting....

    Time taken by sortit() -> 46141ms
    Time taken by radix sort -> 54124ms
    
    • »
      »
      »
      6 months ago, hide # ^ |
      Rev. 2  
      Vote: I like it 0 Vote: I do not like it

      Weird, in my local tests neal's implementation is ~3x faster than yours with radix 256 on uniform random inputs, and can be optimized to ~4x by removing the min-max calculation that is not necessary in your case and overlapping the first and last inner loops of adjacent passes. On codeforces it's harder to benchmark precisely, but results are pretty similar.

      How do you generate inputs? How do you measure execution time?

      • »
        »
        »
        »
        6 months ago, hide # ^ |
         
        Vote: I like it 0 Vote: I do not like it
        • »
          »
          »
          »
          »
          6 months ago, hide # ^ |
          Rev. 7  
          Vote: I like it +10 Vote: I do not like it

          Maybe you didn't enable O2...But when O2 is disabled, radix sort just 100ms slower than sortit() in $$$10^8$$$ sorting...

          This is my result with O2:

          Spoiler

          Without O2:

          Spoiler
        • »
          »
          »
          »
          »
          6 months ago, hide # ^ |
           
          Vote: I like it +10 Vote: I do not like it
        • »
          »
          »
          »
          »
          6 months ago, hide # ^ |
           
          Vote: I like it 0 Vote: I do not like it

          I also optimized my code as said by arvindf232

          bruh, your code still has push_back's

      • »
        »
        »
        »
        6 months ago, hide # ^ |
         
        Vote: I like it 0 Vote: I do not like it

        Could you share your implementation for "can be optimized to ~4x by removing the min-max calculation that is not necessary in your case and overlapping the first and last inner loops of adjacent passes"?

        • »
          »
          »
          »
          »
          6 months ago, hide # ^ |
           
          Vote: I like it 0 Vote: I do not like it

          It's a bit ugly, sorry for that. I've hoped to further improve performance on large uniform arrays with the (MSB-LSB-)hybrid version because of cache locality, but it only pays off if the LSB version doesn't fuse scattering with the next histogram computation.

          Code
»
6 months ago, hide # |
 
Vote: I like it +3 Vote: I do not like it

You're right, but I'm not even sure if my program could take the whole input of 1000000000 numbers.

  • »
    »
    6 months ago, hide # ^ |
    Rev. 2  
    Vote: I like it 0 Vote: I do not like it

    He used this:

            // Increase vector size
            while (a.size() < i) {
                // rand() % (1 << 30) generates a positive 30-bit integer
                int x = rand() % (1 << 30);
                a.push_back(x);
                b.push_back(x);
            }
    

    However, rand() % (1 << 30) doesn't generates a positive 30-bit integer,it only generate [0,32767] Jrke

»
6 months ago, hide # |
 
Vote: I like it +3 Vote: I do not like it

based on my testing this is faster link (implementation)

»
4 months ago, hide # |
 
Vote: I like it +3 Vote: I do not like it

Isn't this at larger values of n just bucketing or values in frequency table and just shifting them into a vector then?