Prescript
This is a beginner concept that I think many people know but I haven't seen written explicitly anywhere, and I have explained it before multiple times to many people asking me "where can I learn more about bitmasks and bitwise operations?"
I haven't seen this explained anywhere so that I can refer them to this resource. Hence, I'm writing about it myself. The techniques in this blog are very intuitive and simple to the extent that you can wonder "why are we even talking about this?" But I do often find it very useful to state something explicitly (and formally), because sometimes intuition can be misleading if not held accountable and made explicit.
Throughout the blog, I try to make the concepts as easy to read as possible while also not eliminating formality. So, I keep the more formal writings enclosed inside spoiler blocks
which you can feel free to skip if uncomfortable with formality, and you'd still benefit from the blog. However, they are highly recommended as they can enhance the understanding of the concept.
Prerequisites: naive/basic set theory
Introduction
If you have taken a discrete math course, you probably saw this argument (noting that $$$\mathcal{P}(S)$$$ is the power set of $$$S$$$):
Given a finite set $$$S$$$ where $$$|S| = n$$$ for some non-negative integer $$$n$$$, then $$$|\mathcal{P}(S)| = 2^n$$$. To see this, $$$S = \{ s_1, s_2, \dots, s_n \}$$$ and you associate to every subset $$$H \subseteq S$$$ a binary string $$$x_1 x_2 \dots x_n$$$ where $$$x_i = 1$$$ if and only if $$$s_i \in H$$$ and $$$0$$$ otherwise. Since there are $$$2^n$$$ such strings (each bit has two options $$$0$$$ and $$$1$$$), we have our conclusion.
This "association" is insightful.
It is also very intuitive: you can tell everything about a subset of a set by identifying which elements are in the subset and which elements are not.
The reverse is also true: you can tell everything about a bitmask by identifying which bits are on and which bits are off!
To that end, let's associate with every non-negative integer $$$x$$$ a set $$$\varphi(x)$$$ containing all positions that are 1 in its binary representation. For example,
$$$\varphi(0) = \emptyset$$$
$$$\varphi(1) = \{ 0 \}$$$
$$$\varphi(2) = \{ 1 \}$$$
$$$\varphi(3) = \{ 0, 1 \}$$$
$$$\varphi(4) = \{ 2 \}$$$
$$$\varphi(5) = \{ 0, 2 \}$$$
and so on. Moreover, observe that such a map $$$\varphi$$$ is invertible, with $$$\displaystyle \varphi^{-1}(S) = \sum_{i \in S} 2^i $$$.
Now, why is this association insightful? There are three applications which I think this association makes the understanding of bitmasks much easier:
Properties of bitwise operations
Submasks and supermasks and brute force over subsets
Disclaimer: it's not useful to think about ALL bitwise problems in this way. For example, the techniques in this blog are irrelevant to a problem like this.
Properties of bitwise operations
A very nice thing about the map $$$\varphi$$$ is that it behaves nicely with bitwise operations. In particular, for any two non-negative integers $$$x, y$$$, we have
$$$\displaystyle \varphi(x \, \& \, y) = \varphi(x) \cap \varphi(y)$$$ (bitwise AND is just the intersection of the sets)
$$$\varphi(x \, |\, y) = \varphi(x) \cup \varphi(y)$$$ (bitwise OR is just the union of the sets)
$$$\varphi(x \oplus y) = \varphi(x) \Delta \varphi(y)$$$ (the bitwise XOR is the symmetric difference between the sets)
You also may have observed before that there is a back-and-forth going between the word and and the intersection of two sets, and the word or and the union of two sets. Even in the definition of union and intersection, you see that
$$$A \cup B = \{ x : x \in A \textbf{ or } x \in B \}$$$
$$$A \cap B = \{ x : x \in A \textbf{ and } x \in B \}$$$
Even we see that in logic, there is an intuition of similarity between $$$\lor$$$ and $$$\cup$$$ and $$$\land$$$ and $$$\cap$$$, and $$$p \lor (q \land r) = (p \lor q) \land (p \lor r)$$$ while in sets, $$$A \cup (B \cap C) = (A \cup B) \cap (A \cup C)$$$ and even De Morgan's Laws on logic transfer to sets.
Since we're working with all finite sets, the above interchange between bitmasks and sets is basically how we express this intuitive notion of similarity between $$$\lor$$$ and $$$\cup$$$ and $$$\land$$$ and $$$\cap$$$.
In other words, this says that bitmasks and bitwise operations behave exactly like sets with $$$\cap, \cup, \Delta$$$.
Now, this is particularly useful since showing properties about bitwise operations is equivalent to showing properties about sets, and dealing with sets can be easier.
Example 1. (Warmup) For any non-negative integers $$$x, y, z$$$, $$$x \& (y \oplus z) = (x \& y) \oplus (x \& z)$$$
In this example, every bit behaves independently of one another, and so it's easy to try to verify it for just $$$x, y, z \in \{ 0, 1 \}$$$ and generalize.
However, using sets, showing that for any finite sets $$$X, Y, Z$$$ we have $$$X \cap (Y \Delta Z) = (X \cap Y) \Delta (X \cap Z)$$$ is sufficient.
Now, while showing the set theoretic identity mathematically is a bit more complicated version of the $$$x, y, z \in \{0, 1\}$$$ argument, visualizing it on a venn diagram can make it too much easier:

If $$$X$$$ is red, $$$Y$$$ is yellow, and $$$Z$$$ is blue, then $$$Y \Delta Z$$$ makes regions $$$2, 3, 4, 5$$$, and so $$$X \cap (Y \Delta Z)$$$ makes regions $$$4, 5$$$. On the other hand, $$$X \cap Y$$$ makes $$$4, 7$$$ and $$$X \cap Z$$$ is $$$5, 7$$$, and so $$$(X \cap Y) \Delta (X \cap Z)$$$ makes $$$4, 5$$$ as well.
Two take-aways:
The idea of using a venn diagram to verify an identity related to bitmasks is very useful and can actually be used to verify many other identities about bitmasks, as will be shown in the examples below.
To show an identity in bitmasks, it is sufficient to show the analog identity in sets. (Formally, you can apply $$$\varphi$$$ on the identity and convert the proper operations to obtain the analog identity in sets).
Example 2: for any non-negative integers $$$x, y$$$, if $$$x \& y = 0$$$, then $$$x | y = x + y$$$
Now, this is tricky, since we don't know how $$$\varphi$$$ behaves with addition yet. But, we can still do something:
Let $$$X = \varphi(x), Y = \varphi(y)$$$, and name $$$\gamma = \varphi^{-1}$$$. Then, the identity above says that if $$$X \cap Y = \emptyset$$$ (or $$$X$$$ and $$$Y$$$ are disjoint, have no elements in common), then
This basically says that $$$\displaystyle \sum_{i \in (X \cup Y)} 2^i = \sum_{i \in X} 2^i + \sum_{i \in Y} 2^i$$$, which is clear when $$$X \cup Y$$$ are disjoint. (If we want to be super-formal, then we can use induction)
Exercise: show that when $$$(x \& y) = x$$$, then $$$y = x + (y \oplus x)$$$. (Hint: show that $$$x | (y \oplus x) = y$$$ and use what we've just shown)
Example 2: for any non-negative integers $$$x, y$$$, $$$(x | y) = x + y - (x \& y)$$$
If we apply the same technique as above, we get
Looks familiar? Yes, this is nothing but the inclusion/exclusion principle. Intuitively, this looks very clear from the venn diagram above, if you take blue and green, then we repeat $$$7$$$ and $$$6$$$ when adding $$$x + y$$$ and so we just remove the overcounted part $$$7$$$ and $$$6$$$.
Exercise: show that for any non-negative integers $$$x, y, z$$$, $$$(x | y | z) = x + y + z - (x \& y) - (x \& z) - (y \& z) + (x \& y \& z)$$$.
Example 3: for any non-negative integers $$$x, y$$$, $$$x \oplus y = x + y - 2(x \& y)$$$
This has been labelled before as a weird property in some blogs that I have seen, and hasn't been straightforward for many people I have spoken with. However, it can be seen clearly now, since it can be seen clearly from the venn diagram that $$$(x \oplus y) + (x \& y) = (x | y)$$$
So, manipulating the above equation by adding $$$x \& y$$$ to both sides yields $$$x | y = x + y - (x \& y)$$$ which is inclusion/exclusion shown in the previous example.
You can even observe the identity directly (without any manipulations) through a venn diagram, which goes to show how useful this can be.
Exercise: show that $$$x + y + z = (x \oplus y \oplus z) + 2(x \& y) + 2(x \& z) + 2(y \& z) - 4(x \& y \& z)$$$
Additional Exercises: show all the bitwise formulas in this blog
Submasks and supermasks and subset search
The definitions of submasks/supermasks vary, $$$x$$$ is a submask of $$$y$$$ (or $$$y$$$ is a supermask of $$$x$$$) if
- $$$(x \& y) = x$$$
- $$$(x | y) = y$$$
- All the bits lid in $$$x$$$ are lid in $$$y$$$.
In sets, it's simple: if $$$X = \varphi(x)$$$ and $$$Y = \varphi(y)$$$, then $$$x$$$ is a submask of $$$y$$$ if and only if $$$X \subseteq Y$$$. Convenient, right?
Of course, this is the same as $$$X \cap Y = X$$$ and $$$X \cup Y = Y$$$.
Now, this means that if $$$(x \& y) = z$$$, then $$$z$$$ is a submask of both $$$x$$$ and $$$y$$$, and if $$$(x | y) = z$$$, then both $$$x$$$ and $$$y$$$ are submasks of $$$z$$$.
Note: observe that the number of possible submasks is $$$2^{|\varphi(x)|}$$$, which is analogous to the number of elements in the powerset of a set. $$$|\varphi(x)|$$$ can be calculated using __builtin_popcount
An example problem where thinking in terms of submasks/subsets is this.
Solution: Observe that if $$$a$$$ is a submask of $$$b$$$, then we're done since $$$A \cup B = B$$$. Moreover, once we apply the third operation, then we know $$$b$$$ becomes a submask of $$$a$$$ and $$$b \le a$$$, and so we can only apply the second operation since the other operations only increase $$$a$$$ but not decrease it.
Additionally, if we increase apply the third operation to get $$$x = (a | b)$$$ and then increase $$$b$$$ to $$$x$$$, this is the same as increasing $$$b$$$ until it becomes $$$x$$$ and then applying $$$a = (a | b)$$$. So, it's optimal to apply it in the end, since you can get to a supermask $$$b'$$$ of $$$a$$$ that is less than $$$x$$$ currently. So, what you do is either apply first or second operation then at the end apply third if necessary.
Then, we're trying to find $$$x, y$$$ such that $$$(a + x) \subseteq (b + y)$$$ and $$$x + y$$$ is minimal. It is sufficient to loop over $$$y$$$, and find the smallest submask of $$$b + y$$$, $$$a'$$$ that is at least $$$a$$$, and set $$$x = a' - a$$$. This is easy to do by iterating over bits of $$$b + y$$$ from largest to smallest.
Subset search: We previously said every subset of a certain set can be mapped into a bitmask as well.
This means that if we have an array of $$$n \leqslant 20$$$ elements, then we can brute force the subsets of the set $$$\{ 0, 1, 2, \dots, n - 1\}$$$ its $$$2^n$$$ subsets by just iterating over all the bitmasks of size $$$2^n$$$, which is a well-known technique.
If we have a subset $$$X$$$, then checking that $$$j \in X$$$ is equivalent to checking that $$$\{ j \} \cap X = \{ j \}$$$, or $$$\gamma(\{ j \}) \& \gamma(X) = \gamma(\{ j \})$$$ which is the same as $$$2^j \& x = 2^j$$$. That's why we often see the condition if ((1 << j) & x), which is actually saying if (((1 << j) & x) == (1 << j)).
Update: check comments for additional content :)









A note for those who are more into math and would like to know some relevant definitions from measure theory: (while you don't have to understand measure theory to understand the below definitions)
If $$$X$$$ is a set, a nonempty collection of subsets of $$$X$$$, $$$\mathcal{R} \subseteq \mathcal{P}(X)$$$, is called a ring of sets (this is different from rings in algebra) if and only if for any $$$A, B \in \mathcal{R}$$$
Observe that this implies that it is also closed under intersections, since $$$A \cap B = A \setminus (A \setminus B)$$$, and moreover, if $$$\emptyset = A \setminus A \in \mathcal{R}$$$
Now, why is this relevant? Because the family of finite subsets of natural numbers, $$$\mathcal{P}_\omega(\mathbb{N})$$$, is a ring of sets.
Moreover, we can define something called a content on any ring of sets $$$\mathcal{R}$$$, which is a function $$$\alpha : \mathcal{R} \to \mathbb{R}_{\ge 0} \cup \{ \infty \}$$$ (ignore the $$$\infty$$$ for now) such that
Now, as we've seen in Example 2, $$$\gamma = \varphi^{-1}$$$ is a content on $$$\mathcal{P}_\omega(\mathbb{N})$$$. Why is this useful? Because you can show that any content on a ring of sets satisfies inclusion/exclusion:
And this is always true. This is useful in case the reader was curious about a generalization of the inclusion/exclusion (what kind of functions do have them)
If anyone was curious ...:
Observe that we can conclude
from the additivity of $$$\alpha$$$, since
The last step is $$$\alpha(A \cap B) + \alpha(B \setminus (A \cap B)) = \alpha(B)$$$, which is why it follows by addivity.
Now, we can use $$$(1)$$$ to induct on the given claim, with itself as the base case. So, let $$$\displaystyle S_k = \bigcup_{i = 1}^k A_i$$$ and note that
from $$$(1)$$$. Now,
So, we can use the inductive hypothesis on both $$$\alpha(S_{n - 1})$$$ and $$$\alpha(A_n \cap S_{n - 1})$$$ as
as desired.
Additional exercises:
Show that $$$|\cdot| : \mathcal{P}_\omega(\mathbb{N}) \to \mathbb{N}$$$ where $$$|X|$$$ denotes the number of elements of $$$X$$$ is a content over $$$\mathcal{P}_\omega(\mathbb{N})$$$. Provide an analogue of the inclusion/exclusion principle over the non-negative integers using $$$\&, |$$$ and $$$\text{popcount}$$$.
Show that for $$$a_1, a_2, \dots, a_n \in \mathbb{N}$$$
this was truly our bitset