nor's blog

By nor, 3 months ago, In English

Note: for those who don't like using the post-increment operator (i++), hopefully this blog convinces you that it is not just a convention that C programmers coaxed the world into following out of tradition. Also, all of the following discusses the increment operators in C++, and not C, where the semantics are slightly different.

Disclaimer: I use ++i much more often in code. But i++ has its own place, and I use it — and the generalization I mention — quite frequently wherever it is a sane choice.

Is ++i better than i++?

A lot of people are taught that since i++ also returns the old value of i before the increment, it needs a temporary, and hence it must be slower. Good news: almost all compilers today optimize this part away whenever this behaviour is not needed, and the difference between i++ and ++i doesn't matter.

So what's the difference? In C++, turns out that apart from this old/new value difference, there is another difference.

Any expression in C++ that looks like a op= b where op is a binary operator in fact "returns" a reference to the variable a after completing the operation. As a result of this, you can do things like this:

// replaces x by a * x % p
void mod_mul(int64_t& x, int a, int p) {
  (x *= a) %= p;
}

Turns out that since you only need the value after the increment for the pre-increment operator, it is possible to have similar behaviour there too, and that is precisely what C++ chose to do:

// replaces x by (x + 1) % p
void mod_inc(int64_t& x, int p) {
  ++x %= p;
}

However, the post-increment doesn't enjoy any of these properties. Perhaps this is why the Google C++ Style Guide (which I don't really like all that much, but will just quote here) recommends

Use the prefix form (++i) of the increment and decrement operators unless you need postfix semantics.

Another part of the language where you can see this distinction in practice is when you overload pre-increment and post-increment operators for a class:

struct Int {
  int value{};
  Int& operator++() {
    std::cout << "pre-incrementing\n";
    ++value; return *this;
  }
  Int operator++(int) {
    std::cout << "post-incrementing\n";
    Int temp = *this;
    ++value;
    return temp;
  }
};

If you notice closely, the return type of the pre-increment overload is a reference, which is not the case with the post-increment overload. Using a post-increment also makes a copy, so if you're using a post-increment on a struct/class object that defines both, it is likely that there is a potential performance improvement lying right there.

So, is there any legitimate use-case for post-increment?

Indeed, there is. Notice how in the post-increment code in the previous example, we had to store a temporary. This gives us a hint as to where exactly post-increment is useful: when you need to avoid temporary variables! And one of the places where temporary variables are not needed is when an operation (in this case, increment) must be done after any use of a value.

An example to illustrate this point comes when you want to implement copying of a C-style, null-terminated string (without copying the null terminator so that you're able to, let's say, concatenate strings into a buffer). Let's say we didn't have post-increment in the language at all. The algorithm looks like this:

void copy_c_string(const char* input, char* output) {
  while (*input) {
    *output = *input;
    ++input;
    ++output;
  }
}

The first thing one would say is that there are too many lines of code — you need to check input, need to separately handle input and output pointers, and if someone is refactoring this code and accidentally deletes one of the increments, they will introduce a bug that is hard to find in a large codebase.

Contrast that with the following:

void copy_c_string(const char* input, char* output) {
  while (*input) {
    *output++ = *input++;
  }
}

The intent of moving the pointer forward immediately after every use (and using it exactly once, too) is perfectly conveyed, we don't have too many lines of code, and all is well. Of course, it takes some time getting used to writing/reading this kind of code, but thinking of it in terms of "immediate operation after use" shows that it works analogously to what RAII is to a manual destructor call.

In fact, the usage of post-increment and pre-decrement (not pre-increment) was (and is) so popular (partly due to widespread use of half-open ranges), that certain architectures in the past had special functionality for them (for example, this). Thus, in some rare cases, it is imperative to use post-increment instead of pre-increment for performance.

Do we have an op= style (compound assignment) generalization for post-increment?

Yes, we do. It is called std::exchange.

For the C++ experts

What it does is this — std::exchange(a, b) returns the old value of a after setting it to b. So i++ is the same as std::exchange(i, i + 1) in terms of semantics.

Similarly, a = std::exchange(b, a) swaps the values of a and b, which shows that it is more general than std::swap, though maybe at a small performance hit for types more complicated than a simple integer. Adding a bunch of std::moves should fix that to some extent, though.

Why the strange name? How do I remember it?

I consider this to be a naming disaster, and believe that if there was a better succinct name that does NOT imply bidirectional flow of data, the C++ committee would probably have tried to do it.

The way I remember it is a = std::exchange(b, f(a, b)) is (morally) equivalent to std::tie(a, b) = std::make_pair(b, f(a, b)) — that is, the assignments are done at the same time. So somewhat like storing temporaries for everything that is to be assigned, and then performing the assignments at the same time.

A more general example is a = g(a, std::exchange(b, f(a, b))) which does std::tie(a, b) = std::make_pair(g(a, b), f(a, b)) — in a mathematical sense, it allows you to do two parallel assignments in a linear chain-like syntax. I believe this is what brings about the exchange naming (if not the CMPXCHG instruction).

A more intuitive way to read A = std::exchange(B, C) is that A becomes B and B becomes C at the same time.

Some competitive programming relevant examples

Fibonacci/linear recurrences

Writing a = std::exchange(b, a + b) is much easier and less error-prone than auto tmp = a; a = b; b = tmp + b; (in fact, someone pointed out that the initial version of this code was wrong). The same goes for any other linear recurrence in 2 variables.

Iterative GCD and extended GCD

Since the Euclidean GCD algorithm is also an affine transformation (especially one where you swap after update), the implementation goes from the less readable/intuitive

std::array<T, 3> extgcd(T a, T b) {
    std::array x{T{1}, T{0}};
    while (b) {
        std::swap(x[0] -= a / b * x[1], x[1]);   // x[0] = x[1], x[1] = x[0] - a/b * x[1]
        std::swap(a %= b, b);                    // a = b, b = a % b
    }
    return {x[0], x[1], a};
}

to

std::array<T, 3> extgcd(T a, T b) {
    std::array x{T{1}, T{0}};
    while (b) {
        x[0] = std::exchange(x[1], x[0] - a / b * x[1]);
        a = std::exchange(b, a % b);
    }
    return {x[0], x[1], a};
}
Lazy updates in buffers (use-and-clear)

It is usually error-prone to clear a buffer of, let's say, updates that you want to process, in the very end of the processing function, as follows:

void flush() {
  for (auto x : updates) {
    // do something
  }
  updates.clear(); // easy to forget
}

Sometimes this might lead to a MLE as well, if this flush function is in fact a DFS that is called in a loop after processing the updates, where the graph is built during the function itself, and the memory limit is low or the structure you store inside the buffer has quite a lot of information.

To be able to deal with the first of these issues, people used something of the following sort:

void flush() {
  std::vector<Update> updates_tmp;
  using std::swap; swap(updates_tmp, updates);
  for (auto x : updates_tmp) {
    // do something
  }
}

This faces multiple issues

  • You need to create and assign a temporary, and also need to remember the type of the updates variable (can you see where we are going with this?)
  • The temporary exists till the end of the function, so this doesn't fix the second issue.
  • Initializing/declaring and then modifying is a very non-functional (anti-)pattern, and making it a habit in code will almost invariably lead to state-mutation-related bugs.

Consider the following solution using std::exchange:

void flush() {
  for (auto x : std::exchange(updates, {})) {
    // do something
  }
}

Which of these two implementations mirrors the intent of the original function? And which one would you choose to write yourself?

Some software engineering relevant examples

Implementing move constructors and assignment operators

While defining move semantics for your class, it is usually a good idea to keep the moved-from object in defined state. Sometimes this might even be necessary, for instance, when you want to implement something with semantics like std::unique_ptr.

Avoiding redundancy.

It is possible to avoid having to implement post-increment when pre-increment is already available by just doing return std::exchange(*this, ++Type{*this})

Acrobatics with locks

The "immediately do after use" idea also carries over to when you're using locks — it is clearly optimal to use locks for as short a time as possible, when comparing across the same kinds of software architecture. One way that std::exchange can be used to this effect is to use a scoped lock inside a lambda, that returns std::exchange(resource, resetted_resource).

In fact, there is a common primitive in lock-free programming that uses the exchange idiom (compare-and-swap, test-and-set, std::atomic_exchange).

Generic programming with iterators

Sometimes the only way to increment an iterator is std::next. Using std::exchange, we can implement our own version of post-increment to be able to write generic code using iterators. Since iterators are very general, you might find yourself using this trick in very unexpected places too.


Please let me know what you think about this C++ feature and if there are any mistakes in the above blog. Comments are welcome!

  • Vote: I like it
  • +235
  • Vote: I do not like it

»
3 months ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

Great blog! Thank you so much. I wish I could upvote twice.

  • »
    »
    3 months ago, # ^ |
      Vote: I like it +9 Vote: I do not like it

    Glad that you liked the blog. I'm planning to write some more C++ blogs too.

»
3 months ago, # |
  Vote: I like it +32 Vote: I do not like it

I use it in non-recursive get in dsu

while(i != p[i]) i = exchange(p[i], v);

  • »
    »
    3 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Just curious, do you really need sz to be mutable? Seems like it is not necessary

  • »
    »
    3 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Nice example!

»
3 months ago, # |
  Vote: I like it +11 Vote: I do not like it

Just want to put the function signature of std::exchange here. I think I understand better after I know its signature.

template< class T, class U = T >
constexpr T exchange( T& obj, U&& new_value );
  • »
    »
    3 months ago, # ^ |
    Rev. 2   Vote: I like it +8 Vote: I do not like it

    Yeah, in retrospect it seems that I should have put it in the blog for people who know about templates.

»
3 months ago, # |
  Vote: I like it 0 Vote: I do not like it

I think, copy_c_string should be implemented as while (*output++ = *input++); in order to terminate the copied string with '\0' in the output.

  • »
    »
    3 months ago, # ^ |
    Rev. 2   Vote: I like it +8 Vote: I do not like it

    Yeah, my bad, somewhere I had switched from copying arrays to C strings. The example holds for both, and this one liner shows the true power of null-terminated strings (not that I would use null-terminated strings anywhere given that std::string exists). Updated the blog, since now the algorithm can be used to concatenate multiple strings.

»
3 months ago, # |
Rev. 2   Vote: I like it +11 Vote: I do not like it

We can also use std::tie(a, b) = std::make_pair(b, a + b) Similar to python, very readable.

  • »
    »
    3 months ago, # ^ |
      Vote: I like it +13 Vote: I do not like it

    The only problem with this piece of code is too many copies, std::exchange is faster on larger structs. Definitely more readable though, to the extent that I had used this as an explanation for the functionality of std::exchange.

»
3 months ago, # |
  Vote: I like it +16 Vote: I do not like it

From this blog post, I learned that extended gcd can be implemented non-recursively (or at least with no backtracking).

  • »
    »
    3 months ago, # ^ |
      Vote: I like it +28 Vote: I do not like it

    I found it a bit surprising that most people implement extended GCD recursively, given that recursion is not very obvious (at least to me), and each step of extended GCD is just manipulating coefficients so that the equation is true before and after the operations that replace arguments with their new values.

    • »
      »
      »
      3 months ago, # ^ |
        Vote: I like it +39 Vote: I do not like it

      I think it's just because it was done recursively on emaxx, and from then on we do it exactly that way.

»
3 months ago, # |
  Vote: I like it +17 Vote: I do not like it

An important thing to keep in mind is that most things in your code aren't bottlenecking the performance, and this gets more true the more complex your code gets, so how exactly you write them (as opposed to what logic you implement) is almost irrelevant. You shouldn't overoptimise or worry about maximum performance ahead of time. Just write something that's not slow (good algorithm and follow basic patterns like processing inner array in inner loop), check if it's good enough and move on. Your writing time is more valuable than the resulting program's time.

It's useful to know these things just in case you find out that your code isn't good enough and you need hardcore optimisations, or if you're writing a performance-critical library that's going to be reused.

  • »
    »
    3 months ago, # ^ |
    Rev. 3   Vote: I like it +9 Vote: I do not like it

    I'm not sure where performance comes into discussion here. That is unless it is an argument for using i++ instead of ++i, targeted towards people who use ++i because of performance claims, which have been debunked in the blog too, except for the cases where the operators are defined for a class and can have potentially different behaviour.

    Also, I believe that when writing code, readability becomes quite important as the code becomes more and more complex. So expressing intent through code properly can prevent you from having bugs in the first place as well as help you catch bugs.

    If you're referring to the topic of the blog and didn't bother reading the content, it was more about getting rid of temporaries because they make code uglier, rather than performance considerations, since the compiler will generate the same code most of the time.

    • »
      »
      »
      3 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Yeah, I was referring to those performance claims you addressed at the start — even if they were true, it almost doesn't matter.

      Regarding readability of code, it's not all that much about exact implementation, since the definition void copy_c_string(const char* input, char* output) offers much more info about what that code is supposed to do than the function body itself. When the code is structured like this, implementation of individual building blocks will need to be read less, and the most readable code is one where you don't need to look into extra details at all.

      • »
        »
        »
        »
        3 months ago, # ^ |
        Rev. 3   Vote: I like it +3 Vote: I do not like it

        Ah, that makes more sense.

        Regarding whether it is important to implement it properly — well, if it is write-once read-never code, then it makes sense to only have good names (and short functions like this) and not care about quality of implementation so much as long as it just works. Maybe this was not a very good example given how short it was, but in both longer functions (complicated functionality) as well as code that needs to be maintained (for example, during refactoring), having a harder-to-modify-incorrectly implementation is more preferable in general — in fact this is why functional code is seen as cleaner.

        This might only be partially true for competitive programming, where we optimize for writing once and not for maintaining, but is something that should be kept in mind in general.

»
3 months ago, # |
  Vote: I like it +8 Vote: I do not like it

Some other examples taken from code I wrote in the past

// union-find path compression
int find(int i) {
  int j = i;
  while (j != rep[j]) j = rep[j];
  while (i != j) i = exchange(rep[i], j);
  return j;
}

// delete a linked list
void del(list* p) {
  while (p) {
    delete exchange(p, p->next);
  }
}

// print space-separated ints
void print(vector<int> v) {
  auto sep = "";
  for (int x : v) {
    cout << exchange(sep, " ") << x;
  }