### Redpo's blog

By Redpo, 3 months ago,

Hello, Codeforces!

I recently came across a data structure that I find quite interesting: the sparse set. It functions similarly to a bitset and supports all its operations but has some quirks and useful features of its own. Particularly, unlike almost every other data structure, it does not need to be initialised at all before it can be used! Naturally, that also means that its data can be reset in $O(1)$ time, as opposed to the $O(n)$ time expected of regular bitsets. It also has the added benefit that traversing its elements takes time proportional to the number of elements rather than the size of the data structure, making the operation more efficient for sparse bitsets. However, note that it is likely less efficient in terms of its memory usage (still $O(n)$ but ~64× larger constant) and constant time factor.

## Introduction

The sparse set consists simply of two integer arrays (which I'll call $d$ and $s$) and a size variable $n$, satisfying the following class invariants:

• Invariant 1: $d[0..(n-1)]$ (inclusive) stores the elements of the bitset;
• Invariant 2: If $e$ is an element of the bitset, then $s[e]$ stores its index in the $d$ array.

In most literature covering this data structure, the arrays $d$ and $s$ are called the 'dense' and the 'sparse' arrays, respectively.

For any array element not mentioned above, its value can be anything without affecting the function of the data structure.

A typical sparse set might look like this (asterisk denotes terms that can be anything):

The red lines indicate the connections between the elements in $d$ and $s$. It can be shown that, for all indices $i$ in the range $[0, n-1]$, the constraint $d[s[d[i]]] = d[i]$ is satisfied.

In C++ code, the data structure can be represented as follows:

template <int N>
struct sparse_set {
int n;
int d[N], s[N];
};


Note that, in this article, we would use $N$ to denote the size of the data structure itself (and thus, the set of elements allowed is ${0, 1, \ldots, N-1}$), and $n$ to denote the number of elements currently stored.

Important implementation detail: if you are using C, C++, or Rust, you still have to initialise this data structure in its constructor since accessing an uninitialised element does not return an unspecified value; it is undefined behaviour and can introduce bugs elsewhere in your program. You don't have to do this when resetting the data structure, however.

Example one-line constructor
• +78

By Redpo, 4 months ago,

Hello, Codeforces!

Here are some language tips and tricks I have discovered that could (hopefully) help you write more concise or efficient code. Note that all of the below tips are specific to the C++ programming language and the STL library.

## Creating a std::set or std::map from sorted data in $O(n)$ time

The sorted containers set and map are usually implemented as red–black trees, meaning that their insert operation runs in $O(\log n)$ time by default. Naturally, repeating the insert operation $n$ times would result in a time complexity of $O(n\log n)$.

vector<int> a = /* ... */;
set<int> s1;
for (int e : a) { s1.insert(e); } // O(n log n)


However, set and map also provide a constructor method, that directly constructs the data structure from a given pair of iterators. In most implementations of C++, if the data contained within the range is already sorted, the constructor instead runs in $O(n)$ time.

vector<int> a = /* some sorted data */;
set<int> s2(a.begin(), a.end()); // O(n)

int arr[n + 10] = /* more sorted data */;
set<int> s3(&arr[1], &arr[n] + 1); // ditto


Here is a simple benchmark code that I have written to illustrate the above performance difference. The code generates $n$ integers in the range $[0, n]$, sorts them, then inserts them to sets using the two above methods. It can be seen, that the $O(n)$ method generally runs in half the time of the $O(n\log n)$ method:

Benchmark code
Results (nanoseconds)

If directly constructing the set (or map) is not possible (e.g. due to prior operations needing to be done on the set, or due to the data being online, etc.), the C++ set container provides an overload of the .insert() method that allows you to also specify a 'hint' iterator, which points to the element just greater than the element to be inserted. In other words, if e is to be inserted into s, this 'hint' iterator should be equal to s.lower_bound(e).

If the given hint iterator happens to be correct, then the .insert() call runs in amortised $O(1)$ time. Otherwise, the call still runs in O(\log n) time but may be slower than the regular .insert() call.

s.insert(s.lower_bound(10), 10); // The 'insert' operation itself runs in amortised O(1) time


The above example is not very helpful, since the .lower_bound() call itself already takes $O(\log n)$ time, defeating the entire purpose of the optimisation. However, this can be useful if we already have extra information about the position of the to-be-added element. In particular,

• If the element e is greater than all elements currently in s, we can write s.insert(s.end(), e).
• Similarly, if the element e is less than all elements currently in s, we can write s.insert(s.begin(), e).

Similar operations can also be done on the map data structure, except that the second argument (element to be inserted) now needs to be a key-value std::pair.

## Reducing the number of memory allocations using .reserve()

In the vast majority of C++ STL implementations, the containers vector and string have an internal buffer that is used to store data. Whenever a new element needs to be inserted while this buffer is full, the data structure first allocates a new, larger buffer, copies all data over to it, and then deallocates the original buffer. Whenever this occurs with a .push_back() call, the time taken for that call would be $O(n)$. Since such memory reallocations can be very expensive, it would be ideal if we could decrease the number of reallocations.

The .reserve() method, available for both vectors and strings, can lessen the frequency of reallocations by making the data structure expand its buffer to a provided size. For instance, if v is a vector, v.reserve(10); expands v's internal buffer to have a size of at least 10.

.reserve() has the following properties:

• Iterators and references are invalidated if and only if the provided size exceeds the current size of the buffer.
• If the buffer is already big enough, nothing happens. .reserve() cannot be used to decrease the size of a container.
• If the .reserve() call did change the size of the buffer, the time taken would be $O(n)$, where $n$ is the size of the buffer.
• .reserve() only changes the internal size of the container, not the logical size. For instance, the following code segment is still illegal (undefined behaviour):
vector<int> v(5, 1); // v has size 5
v.reserve(100);      // v now has buffer size at least 100, but logically still size 5
v[10] = 20;          // undefined behaviour


It is important to note that, since .reserve() calls themselves cause reallocations to occur, they should be used sparingly, preferably only once just after the declaration of the container itself. For instance, the following code is highly inefficient and, theoretically, would increases the time taken from $O(n)$ to $O(n^2)$:

int n;
cin >> n;
vector<int> a;
for (int i = 0; i < n; i++) {
a.reserve(i + 1);
int e;
cin >> e;
a.push_back(e);
}

Correct version ($O(n)$)

Additionally, for strings, directly inputting it from std::cin would also cause it to expand its buffer size dynamically. On Codeforces, many problems would give you the size of the string first before the string itself. In these scenarios, the following code can be written to read the string more effectively:

int n;
string s;
cin >> n;
s.reserve(n);
cin >> s;


Finally, note that calling .reserve() with an argument exceeding the actual size of the container can cause memory to be wasted. If memory is critical but the exact size of the container is not known, we can use the .shrink_to_fit() method to reduce the buffer size to what is strictly necessary to carry the data:

string s;
s.reserve(100);
cin >> s;
// s is likely to be less than 100 characters, therefore shrink to save memory
s.shrink_to_fit();


Note that, unlike .reserve(), .shrink_to_fit() is not guaranteed to have any effect on the container, though for most implementations, it does reduce the memory usage.

## The macro ONLINE_JUDGE

When a piece of C++ code is run on Codeforces, the platform automatically defines a macro ONLINE_JUDGE. This can be useful in a variety of scenarios when you want to differentiate whether the code is being tested on your device or being judged online.

For instance, a common optimisation for improving I/O efficiency is to untie the standard streams cin and cout using the following line of code: cin.tie(0). By default, C++ flushes the cout stream whenever an input operation is performed, so that the user can see the output before being requested to input. While this would slow down the program and increase the running time, it could still be a useful feature to have when testing the code. Therefore, we can conditionally enable this optimisation using the following code segment:

#ifdef ONLINE_JUDGE
cin.tie(nullptr);
#endif


This way, the output is not affected when the code is being run on your local device. Be careful not to misspell ONLINE_JUDGE! Otherwise the optimisation will not be applied anywhere and may cause great performance penalties.

Another common usage of ONLINE_JUDGE is writing debug output functions. You can write a debugging function that only outputs to cout when not officially submitted and judged using the following code template:

void debug(/* any parameters */) {
#ifndef ONLINE_JUDGE
// code that produces debug output
#endif
}


The debug function can now be used anywhere inside your code to produce output only when testing. When judging, the function is a no-op. Be careful not to mix up the preprocessor directives — here it is ifndef instead of ifdef, which causes the code that follows to be compiled only when the macro is not defined.

## Explicit types for min and max

By default, the functions min and max from <algorithm> cause compile errors when the arguments are not of the same type. For instance, the below statements all cause compilation failures:

min(3, 2.5);
max(4, 7LL);
min('a', 'd' + 1);
max(10, a.size()); // a is a container


In these cases, you can explicitly specify a type for both parameters to be converted to in order to compile. Beware of losses to range/precision, however.

min<double>(3, 2.5);
max<long long>(4, 7LL);
min<char>('a', 'd' + 1); // narrows the second argument, which is an int
max<int>(10, a.size());  // technically narrows a.size() but usually won't matter


## Lambda comparator for priority_queue

You can create a priority_queue whose comparator function is a lambda without having to create a separate class. For instance, to create a priority_queue for Dijkstra's algorithm where entries are stored as node–distance pairs, one can write:

using ll = long long;
using entry_t = pair<int, ll>;

auto cmp = [](entry_t a, entry_t b) { return a.second > b.second; };
priority_queue<entry_t, vector<entry_t>, decltype(cmp)> q(cmp);


In my opinion, this is more natural than using (negative distance)–node pairs, for which the default sorting would have worked, but I find it more difficult to work with and possibly more bug-prone.

## Miscellaneous helpful STL functions

The following are some lesser-known STL functions that I have found useful in a variety of scenarios. I find that they can make code more concise by a considerable degree.

• div and lldiv (<cstdlib>): takes two integer arguments (int for div, long long for lldiv), returns a structure that contains the quotient and the remainder but only takes one operation. Can be unpacked using structured bindings: auto [quot, rem] = div(a, b);
• reduce (<numeric>): given a range (an iterator pair), returns the sum of all elements in it. Beware of arithmetic overflows, however.
• minmax_element (<algorithm>): given a range, returns a pair of iterators (pointers for arrays) to the minimum and maximum element inside that range. This is usually better than two separate calls to min_element and max_element since the range is traversed only once. Can be unpacked using structured bindings, too.
• inclusive_scan and exclusive_scan (<numeric>): generates the prefix sum for a range. The nth output of inclusive_scan includes the nth element in the sum, while that of exclusive_scan does not.
• swap (<utility>): swaps two elements. Though we can implement swapping ourselves in merely 3 lines, the frequency of this operation makes this function still useful. Additionally, swaps for container classes such as vectors or strings are usually more optimised than swapping via a temporary variable since most implementations of STL allow swaps to access the innards of these structures.

## Conclusion

So, the above is some C++ tips that I would like to share with you all. If you have any feedback regarding the content or the format, please kindly leave them in the comments below. Thank you very much for reading this far!

• +75