Getting TLE inserting elements in set

→ Обратите внимание

До соревнования
CodeTON Round 9 (Div. 1 + Div. 2, Rated, Prizes!)
15:33:30
Зарегистрироваться »

*есть доп. регистрация

→ Трансляции

Leetcode BiWeekly Contest 144 — Solution Discussion

Shayan

До начала 17:03:30

Codeforces CodeTON Round 9 (Div 1 + Div 2) — Solution Discussion

Shayan

До начала 18:33:30

Всё →

→ Лидеры (рейтинг)

№	Пользователь	Рейтинг
1	tourist	4009
2	jiangly	3823
3	Benq	3738
4	Radewoosh	3633
5	jqdai0815	3620
6	orzdevinwang	3529
7	ecnerwala	3446
8	Um_nik	3396
9	ksun48	3390
10	gamegame	3386

Страны | Города | Организации

Всё →

→ Лидеры (вклад)

№	Пользователь	Вклад
1	cry	167
2	Um_nik	163
3	maomao90	162
3	atcoder_official	162
5	adamant	159
6	-is-this-fft-	158
7	awoo	157
8	TheScrasse	154
9	Dominater069	153
9	nor	153

Всё →

→ Найти пользователя

→ Прямой эфир

Детальнее →

Блог пользователя Pieck

Getting TLE inserting elements in set

Автор Pieck, история, 3 года назад, По-английски

CLOSED

Can someone please explain me why this piece of code is taking around 25 secs to run, though the complexity is O(N^2log(N)) where N<=5000.

int n = 5000;
for(int i=0; i<n; i++) {
    set<int> s;
    for(int j=n; j>=1; j--) s.insert(j);
}

set, time complexity

Pieck
3 года назад
14

Комментарии (13)

Показать архивные | Написать комментарий?

waltz47

3 года назад, # |

← Rev. 3 →

+10

N^2 log(N) is big. More than 10^8 operations.

→ Ответить

-is-this-fft-

3 года назад, # ^ |

It's only 3e8, in some cases I'd submit that. Seems like the constant is too bad here though.

→ Ответить

Ritwin

3 года назад, # ^ |

← Rev. 3 →

It's slow because:

set and unordered_set are inherently slow, probably because they need to pass pointers/references around and have bad cache locality
It's allocating 100MB dynamically. That's a lot of memory, so obviously it takes a while to allocate and deallocate.

defnotmee's reply below this has a better benchmark than mine. Read that and upvote it!

Original post

Just claiming things isn't good though, so I benchmarked it:

Benchmarking Code

#include <iostream>
#include <chrono>
#include <set>

inline decltype(auto) now() { 
    return std::chrono::high_resolution_clock::now();
}

int main() {
    int64_t t1 = 0;
    auto beg = now();

    int n = 5000;
    for (int i=0; i<n; i++) {
        auto st = now();
        std::set<int> s;
        for (int j=n; j>=1; j--) s.insert(j);
        auto en = now();
        t1 += (en - st).count();
    }

    auto end = now();

    std::cout << "Total time:       " << (end - beg).count() / 1e9 << " s\n";
    std::cout << "No deallocations: " << t1 / 1e9 << " s\n";
}

Output:

Total time:       21.7573 s
No deallocations: 17.8228 s

I can't really measure allocation time (I wish there was a set::reserve method but that wouldn't really make sense in a BST), but the deallocation time is about 4 seconds, which is very surprising.

Moving set<int> to outside the outer loop made it take 10.85 seconds, and by timing the program from the terminal and by putting it in a separate function so that the destructor could be called, I found no extra time wasted on deallocating that 100MB (?).

Also, I tested std::priority_queue<int> to see the difference, and it was pretty shocking: 7.5s with it in the loop, and 5.6s with it outside. unordered_set was also faster, taking 13s including deallocations, 9.76s without, and 2.67 (!) seconds when declaring it outside the loop.

Useful things from my original benchmark:

std Library Container	Including deallocations	Pure runtime
std::set	21.76s	17.8s
std::unordered_set	13s	9.76s
std::priority_queue	7.5s	7.5s

So an Expert tip: Use sorting (which actually tends to be faster than a priority_queue) or a priority_queue instead of a [unordered] [multi] set if you don't need it, because that could be a significant slowdown to your program.

→ Ответить

Pieck

3 года назад, # ^ |

I was looking for something exactly like this. Thanks for the help.

→ Ответить

defnotmee

3 года назад, # ^ |

There are some problems with your benchmark, and specially with "Moving X to outside the outer loop" if you want to test for destructions. With your original code (and not using -O2 because it sometimes cuts part of the code when compiling), that was the output:

Total time:       13.5812 s
No deallocations: 12.0908 s

Now this happens when i take set out of the loop:

Total time:       7.74518 s
No deallocations: 7.74462 s

This makes no sense, and the reason why that is happening is because the code is basically inserting once and checking whether the element is in the set in the other iterations, which it always is since they are all duplicates, and therefore nothing will be inserted. In fact, here is the benchmark of doing exactly that:

Code

#include <iostream>
#include <chrono>
#include <set>

inline decltype(auto) now() { 
    return std::chrono::high_resolution_clock::now();
}

int main() {
    int64_t t1 = 0;
    auto beg = now();

    int n = 5000;
    std::set<int> s;
    for (int j=n; j>=1; j--) s.insert(j);
    for (int i=0; i<n; i++) {
        auto st = now();
        for (int j=n; j>=1; j--) s.count(j);
        auto en = now();
        t1 += (en - st).count();
    }

    auto end = now();

    std::cout << "Total time:       " << (end - beg).count() / 1e9 << " s\n";
    std::cout << "No deallocations: " << t1 / 1e9 << " s\n";
}

Total time:       6.42716 s
No deallocations: 6.4223 s

Close enough.

Here is i think a better test for not having to destruct any sets and actually doing what OP wanted (and also not making a bigger set because we arent testing for that):

Code

#include <iostream>
#include <chrono>
#include <set>

inline decltype(auto) now() { 
    return std::chrono::high_resolution_clock::now();
}

int main() {
    int64_t t1 = 0;
    auto beg = now();

    int n = 5000;
    std::set<int> s[n];
    for (int i=0; i<n; i++) {
        auto st = now();
        for (int j=n; j>=1; j--) s[i].insert(j);
        auto en = now();
        t1 += (en - st).count();
    }

    auto end = now();

    std::cout << "Total time:       " << (end - beg).count() / 1e9 << " s\n";
    std::cout << "No deallocations: " << t1 / 1e9 << " s\n";
}

Total time:       13.1702 s
No deallocations: 13.1694 s

Even if there is a bit of overhead of using the array of sets, i think its pretty safe to say that destructions were a bigger deal than it.

Here are benchmarks comparing those other data structures with the original test, being taken out of the loop and making an array of them:

priority_queue

Here the order you're inserting makes a big difference. Im not particularly familiar on how priority_queue works but it probably has to do with how deep on the heap you need to go to insert the element

===============================

Original test, inserting biggest to smallest

#include <iostream>
#include <chrono>
#include <set>
#include<unordered_set>
#include<queue>

inline decltype(auto) now() { 
    return std::chrono::high_resolution_clock::now();
}

int main() {
    int64_t t1 = 0;
    auto beg = now();

    int n = 5000;
    
    for (int i=0; i<n; i++) {
        auto st = now();
        std::priority_queue<int> s;
        for (int j=n; j>=1; j--) s.push(j);
        auto en = now();
        t1 += (en - st).count();
    }

    auto end = now();

    std::cout << "Total time:       " << (end - beg).count() / 1e9 << " s\n";
    std::cout << "No deallocations: " << t1 / 1e9 << " s\n";
}

Total time:       1.76891 s
No deallocations: 1.76038 s

===============================

Original test, inserting smallest to biggest

#include <iostream>
#include <chrono>
#include <set>
#include<unordered_set>
#include<queue>

inline decltype(auto) now() { 
    return std::chrono::high_resolution_clock::now();
}

int main() {
    int64_t t1 = 0;
    auto beg = now();

    int n = 5000;
    
    for (int i=0; i<n; i++) {
        auto st = now();
        std::priority_queue<int> s;
        for (int j=1; j<= n; j++) s.push(j);
        auto en = now();
        t1 += (en - st).count();
    }

    auto end = now();

    std::cout << "Total time:       " << (end - beg).count() / 1e9 << " s\n";
    std::cout << "No deallocations: " << t1 / 1e9 << " s\n";
}

Total time:       8.59461 s
No deallocations: 8.58484 s

===============================

Just out of the loop, biggest to smallest

#include <iostream>
#include <chrono>
#include <set>
#include<unordered_set>
#include<queue>

inline decltype(auto) now() { 
    return std::chrono::high_resolution_clock::now();
}

int main() {
    int64_t t1 = 0;
    auto beg = now();

    int n = 5000;
    
    std::priority_queue<int> s;

    for (int i=0; i<n; i++) {
        auto st = now();
        
        for (int j=n; j>=1; j--) s.push(j);
        auto en = now();
        t1 += (en - st).count();
    }

    auto end = now();

    std::cout << "Total time:       " << (end - beg).count() / 1e9 << " s\n";
    std::cout << "No deallocations: " << t1 / 1e9 << " s\n";
}

Total time:       2.43477 s
No deallocations: 2.43448 s

Funnilly enough this actually slows down priority queue, since this data structure allows for duplicate elements

===============================

Just out of the loop, smallest to biggest

#include <iostream>
#include <chrono>
#include <set>
#include<unordered_set>
#include<queue>

inline decltype(auto) now() { 
    return std::chrono::high_resolution_clock::now();
}

int main() {
    int64_t t1 = 0;
    auto beg = now();

    int n = 5000;
    
    std::priority_queue<int> s;

    for (int i=0; i<n; i++) {
        auto st = now();
        
        for (int j=1; j<= n; j++) s.push(j);
        auto en = now();
        t1 += (en - st).count();
    }

    auto end = now();

    std::cout << "Total time:       " << (end - beg).count() / 1e9 << " s\n";
    std::cout << "No deallocations: " << t1 / 1e9 << " s\n";
}

Total time:       3.5903 s
No deallocations: 3.58994 s

Priority_queue is really quirky huh. I guess the height of the heap doesnt actually go that far if we repeat elements

===============================

Making an array, biggest to smallest

#include <iostream>
#include <chrono>
#include <set>
#include<unordered_set>
#include<queue>

inline decltype(auto) now() { 
    return std::chrono::high_resolution_clock::now();
}

int main() {
    int64_t t1 = 0;
    auto beg = now();

    int n = 5000;
    
    std::priority_queue<int> s[n];

    for (int i=0; i<n; i++) {
        auto st = now();
        
        for (int j=n; j >= 1; j--) s[i].push(j);
        auto en = now();
        t1 += (en - st).count();
    }

    auto end = now();

    std::cout << "Total time:       " << (end - beg).count() / 1e9 << " s\n";
    std::cout << "No deallocations: " << t1 / 1e9 << " s\n";
}

Total time:       1.87247 s
No deallocations: 1.87196 s

===============================

Making an array, smallest to biggest

#include <iostream>
#include <chrono>
#include <set>
#include<unordered_set>
#include<queue>

inline decltype(auto) now() { 
    return std::chrono::high_resolution_clock::now();
}

int main() {
    int64_t t1 = 0;
    auto beg = now();

    int n = 5000;
    
    std::priority_queue<int> s[n];

    for (int i=0; i<n; i++) {
        auto st = now();
        
        for (int j=1; j<= n; j++) s[i].push(j);
        auto en = now();
        t1 += (en - st).count();
    }

    auto end = now();

    std::cout << "Total time:       " << (end - beg).count() / 1e9 << " s\n";
    std::cout << "No deallocations: " << t1 / 1e9 << " s\n";
}

Total time:       8.59998 s
No deallocations: 8.59925 s

unordered_set

Original test

#include <iostream>
#include <chrono>
#include <set>
#include<unordered_set>

inline decltype(auto) now() { 
    return std::chrono::high_resolution_clock::now();
}

int main() {
    int64_t t1 = 0;
    auto beg = now();

    int n = 5000;
    
    for (int i=0; i<n; i++) {
        auto st = now();
        std::unordered_set<int> s;
        for (int j=n; j>=1; j--) s.insert(j);
        auto en = now();
        t1 += (en - st).count();
    }

    auto end = now();

    std::cout << "Total time:       " << (end - beg).count() / 1e9 << " s\n";
    std::cout << "No deallocations: " << t1 / 1e9 << " s\n";
}

Total time:       7.63321 s
No deallocations: 6.32708 s

===============================

Just out of the loop

#include <iostream>
#include <chrono>
#include <set>
#include<unordered_set>

inline decltype(auto) now() { 
    return std::chrono::high_resolution_clock::now();
}

int main() {
    int64_t t1 = 0;
    auto beg = now();

    int n = 5000;
    std::unordered_set<int> s;
    for (int i=0; i<n; i++) {
        auto st = now();
        for (int j=n; j>=1; j--) s.insert(j);
        auto en = now();
        t1 += (en - st).count();
    }

    auto end = now();

    std::cout << "Total time:       " << (end - beg).count() / 1e9 << " s\n";
    std::cout << "No deallocations: " << t1 / 1e9 << " s\n";
}

Total time:       2.30835 s
No deallocations: 2.30802 s

===============================

Making an array

#include <iostream>
#include <chrono>
#include <set>
#include<unordered_set>

inline decltype(auto) now() { 
    return std::chrono::high_resolution_clock::now();
}

int main() {
    int64_t t1 = 0;
    auto beg = now();

    int n = 5000;
    std::unordered_set<int> s[n];
    for (int i=0; i<n; i++) {
        auto st = now();
        for (int j=n; j>=1; j--) s[i].insert(j);
        auto en = now();
        t1 += (en - st).count();
    }

    auto end = now();

    std::cout << "Total time:       " << (end - beg).count() / 1e9 << " s\n";
    std::cout << "No deallocations: " << t1 / 1e9 << " s\n";
}

Total time:       6.8326 s
No deallocations: 6.83171 s

All of those were run with this command: g++ help.cpp -std=c++17 -Wall -Wextra -Wfloat-equal -o help; .\help

Conclusion:

Destructing actually consumes some time, but not that much
Destructing is particularly negligible with priority queue (i.e vector)
Benchmarks are fun to make
Go priority queue!

→ Ответить

Ritwin

3 года назад, # ^ |

Thank you for this. I guess I have a lot to learn with benchmarking! Would you mind if I edited my comment to say that your benchmark is better?

→ Ответить

defnotmee

3 года назад, # ^ |

Of course i wouldnt mind. Btw have a nice day :)

→ Ответить

Pieck

3 года назад, # ^ |

Ok my bad, I calculated the operations to be around 3e7 but it comes out to be 3e8 and since i and j are changing and we are already in 1e8 range so it pushed the time to 25 seconds. And I noticed this and thought it was something weird but if it would have happened in 1e7 range then it wouldn't even be noticed. Understood.

→ Ответить

AIireza

3 года назад, # |

Complexity analysis can help you predict scaling with problem size, but won't predict absolute times.

→ Ответить

Bitweisser

3 года назад, # |

← Rev. 7 →

You have declared your set<int> s inside the for loop doing which you are instructing to deallocate the previously stored set inside that varibale s (which takes O(N) ). To prove this try declaring one unordered_set (which takes O(1) time) once outside the loop and once inside the loop with the same N or you can try by declaring the set outside the loop directly.

→ Ответить

Pieck

3 года назад, # ^ |

So, the set is needed to be implemented inside the loop as before starting the inner loop I need a set with all elements from 1 to n. I have omitted other details of the solution so it might have been confusing. I know my solution pertaining to that problem is not efficient but I want to know why exactly this happens. Another way could be to declare set outside but after the completion of inner loop I'll need to clear the set for my implementation, which won't improve the results as I'm still clearing the set.

→ Ответить

imtiyazrasool92

3 года назад, # ^ |

← Rev. 2 →

~~This is compilation time not run time~~

→ Ответить

Pieck

3 года назад, # |

Auto comment: topic has been updated by Pieck (previous revision, new revision, compare).

→ Ответить

Соревнования по программированию 2.0

Время на сервере: 23.11.2024 02:01:30 (k3).

Десктопная версия, переключиться на мобильную.

При поддержке