Блог пользователя drdilyor

Автор drdilyor, история, 12 месяцев назад, По-английски

TL;DR: Simply use vector<basic_string<int>> adj(n); instead of vector<vector<int>> adj(n); for graphs. Don't use it anywhere else.

Hey Codeforces! I learned from our template lord nor about basic_string. Contrary to vector, it has memory optimization. When you use a vector, it will allocate on heap and the structure of vector is similar to this:

struct vector {
    size_t size;
    size_t capacity;
    T*     data;
}
/*
64 bits - size
64 bits - capacity
64 bits - data
*/

(Not to mention, the overhead of malloc)

Which means if you are using vector<int> with size 1, it will be using at least 28 bytes and accessing it requires indirection to a completely different part of memory, which is bad for cache. On 32-bit systems, memory usage is lower since they are 32-bits each, but the cache unfriendliness still holds. Also, as nor mentioned, this isn't exactly how the struct vector is implemented in libstdc++.

basic_string has Short string optimization, where it uses 1 bit flag for short or long mode. In long mode it behaves like vector, but in short mode it has different memory layout. For example, basic_string<int> in short form is similar to this (Note that capacity isn't needed as there is no allocation):

/*
 1 bit  - the flag
 7 bits - size
32 bits - int

32 bits - int
32 bits - int

32 bits - int
32 bits - int

24 bits - unused
*/

(I'm simplifying and not including memory alignment)

So in short form, basic_string stores elements directly inside the struct when the size is small! With int, it stores up to 5 ints without allocating on heap, which makes DFS much more cache friendly and uses less memory.

How much would you ask?

Cycle detection, 1905B

Problem 1 vector<edge>       121 ms  25.2 Mib
Problem 1 basic_string<edge>  83 ms  24.3 Mib
Problem 2 vector<int>         62 ms   4.5  MB
Problem 2 basic_string<int>   62 ms   2.9  MB

edge is a struct of 2 ints, so it's bigger and makes the optimization little less useful.

(Problem 2 doesn't require storing adjacency list but I guessed it's a good enough benchmark)

Bear in mind that basic_string requires the element types to be "simple" types, don't use it for vectors, sets and maps, only for ints, simple structs of ints or pairs of these (pair is a bit nuanced but it works).

More about it here

  • Проголосовать: нравится
  • +45
  • Проголосовать: не нравится

»
12 месяцев назад, # |
Rev. 2   Проголосовать: нравится +24 Проголосовать: не нравится

Nice blog! Wanted to point out a couple of things:

  • vector is implemented in multiple ways — the most common implementation seems to be using three pointers with the following semantics: begin, end and capacity_end, but the analysis in the blog still holds since both size and pointers are 64 bits wide (or 32 bits wide on 32 bit systems).
  • To automatically choose a basic_string instead of vector whenever feasible, you can use the following template:
template <typename Value>
using Container = std::conditional_t<(std::is_trivial<Value>::value && std::is_standard_layout<Value>::value && !std::is_array<Value>::value), std::basic_string<Value>, std::vector<Value>>;

Container<int> a; // uses basic_string
Container<std::string> b; // uses vector