Xellos's blog

By Xellos, history, 41 hour(s) ago, In English

C++ is a very permissive language. It'll let you write basically anything on the assumption that you know what you're doing... or let you shoot yourself in the foot if you don't. For example:

class C {
    int a;

  public:
    C(int a) : a{a} {}
};

It's a fairly common pattern in simple class design that's perfectly fine until it's not. The class member and constructor argument (c.arg.) have the same name. This is known as shadowing and compilers will warn you about it, though you need to use a flag -Wshadow that isn't included among -Wall or -Wextra. Another example of shadowing is

for(int i = 0; i < t; i++)
    for(int i = 0; i < n; i++)

which is fine if the first i isn't used for anything except making a loop (such as over test cases) but otherwise it's a solid source of bugs, especially if you "fix" it into for(int j = 0; j < n; i++).

How does shadowing work?

First off, when can one variable shadow another? The standard has a lot of horrifying lawyerish explaining it, but in short, two variables with different scopes can have the same name. Scope is where that variable can be used in code. In the examples above:

  • the scope of a class member is the whole class
  • the scope of a function argument is that function
  • the scope of a variable declared inside for() is that for-loop

The next key question is how the compiler decides which variable is used if there are multiple options with the same name. Simple: the one with tighter scope is used. It's called the shadow variable while any variables with larger scopes are called shadowed by it.

That's why using a in the constructor always refers to the c.arg., while trying to modify the outer loop's i fails if it's shadowed by i in the inner loop. Note that in the member initializer list a{a}, the second a is the shadowing c.arg., but the first a is unambiguous — it says the class member a should be initialized using something.

Sometimes, it's possible to refer to a variable more clearly, not just by its name. A typical example is its namespace, that's why using namespace is bad practice as it removes the ability to disambiguate (it's fine in small self-contained sources that don't use other namespaces) or class name for static variables. With regular class members, there's something called "implicit this", where a can be understood to mean this->a i.e. the member, unless there's something like a shadow variable. Shadowing is one danger of relying on implicit this. If the constructor of C was something complex, a programmer that likes pain can use a for the argument and this->a for the member, so there's no ambiguity (but it's still shadowing).

Of course, the best practice is different names for different variables with overlapping scopes. The Google C++ Style Guide recommends naming member variables with a trailing underscore; I prefer naming them normally and using trailing underscore to disambiguate c.args like in the example up top when there's no clearer name to give. When there's no intentional shadowing, the -Wshadow flag is useful for preventing bugs.

Fixing testlib

Compiling a testlib-using program locally reveals quite a lot of warnings about shadow variables. Those aren't actual errors but it's still inconvenient. Let's deal with them!

  • In the class pattern, there's a member s (line 733) that's shadowed by a c.arg. (line 1531). However, the constructor doesn't just use copy s to pattern::s, it also parses it into a more structured form that the new instance stores (note that s is passed by mutable value). Then, s is only used at one point — the function pattern::src returns it. Therefore, let's rename it to source, making the code self-documenting that it's the pattern's source string.
  • The member function FileInputStreamReader::getc (line 1777) takes a FILE* as an argument, which shadows its member file (line 1763). This function is private and only used within that class a few times with the member file passed as that argument everytime. We might as well remove that argument file, it's unnecessary and confusing.
  • Also in the class FileInputStreamReader, there's a member name (line 1764) shadowed in a constructor (line 1809). The same happens in BufferedFileInputStreamReader (lines 1882, 1923). I'm not sure what those variables are supposed to be for and it highlights why it's good to deal with shadowing — when a constructor takes a file and a name, is it supposed to be the name of the file? Or a more general name given to the stream, which would probably include the filename but isn't limited to it? There are only few usages that don't shed much light on the situation, so I'm going to err on the side of generality and rename the c.arg. to streamName in both cases to indicate that it's not forced to be just the filename.
  • Then there's a few cases with no better name available. The two stream readers above have a member file that's initialized from identically named c.args, but that's always just [the file we're constructing a reader class for], as described by the code. The class ValidatorBoundsHit (line 2391) is a wrapper over 2 bools with a member function merge, so the fact that its members minHit and maxHit are shadowed by c.args doesn't have a resolution through clearer names, it's clear already. I'm going to leave these as is. It's a code style issue and I'm dealing with semantics here. A suitable choice of style like my trailing _ for shadowing c.args would handle it, but code style can be arbitrary and I don't want to force my own.
  • The class InStream has a member mode (line 2018) that's shadowed by a c.arg. repeatedly again (lines ~2030, ~3310). What mode? Mode in testlib can be file opening mode, strict/non-strict, input/output/answer, it's quite a confusing word... and the way it's named would suggest some file mode or stream mode, but it's actually input/output/answer (as the type shows, but digging through types shouldn't be action #1). That has nothing to do with the stream or the file. I want to rename each c.arg. to contentMode and am opening an issue that all these modes should be named clearer.
  • Function setAppesModeEncoding (line 4645) is setting the global variable appesModeEncoding (line 2382) from a shadow argument. Putting aside that global variables are evil and should at least be put in a global state struct within a namespace, I tend to prefix such arguments with new, so newAppesModeEncoding. Sometimes the current value matters and it's better to distinguish between current (not yet old, the change can fail!) and new value by name.

The resulting pull request is here. Not all warnings about shadowing are gone yet but it's a step in the right direction.

Suppressing warnings

What if you want to deal all the pesky warnings in your code but testlib is still spamming your terminal with its own? Suppression pragmas around #include are a (compiler-specific) solution.

#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wshadow"
#include "testlib.h"
#pragma GCC diagnostic pop

This is for GCC, clang has an equivalent, if you're on MSVC or something then it's a you problem but there might be a setting to disable a specific warning on a chunk of code too.

  • Vote: I like it
  • +9
  • Vote: I do not like it

»
80 minutes ago, hide # |
 
Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by Xellos (previous revision, new revision, compare).

»
52 minutes ago, hide # |
 
Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by Xellos (previous revision, new revision, compare).