Hello everyone!
In this blogpost, I want to talk about the probabilistic method in mathematics, how it can be used to obtain good randomized approximation algorithms for hard problems, and how we can use some derandomization techniques to turn these into deterministic algorithms that maintain the exact same guarantees.
Thanks to peltorator for proofreading this article and giving very useful feedback.
The Probabilistic Method
In short, this method (pioneered by Paul Erdős) states: in order to prove that a structure with certain desired properties exists, one can define a probability space of all such structures and mathematically show that the desired properties hold in this space with a strictly positive probability. If the probability is $$$ \gt 0$$$, the structure must exist.
Motivating Problem: Max Cut
The Problem: Given an undirected graph $$$G = (V, E)$$$, a 2-coloring of its vertices is an assignment of a color (Red or Blue) to each vertex. Given any 2-coloring, we define a cut as the set of edges whose endpoints have different colors. We are interested in finding a 2-coloring that maximizes the cut size. Let's denote that maximum possible cut size as $$$M$$$.
This problem is actually NP-hard for general graphs, so instead of trying to design an algorithm that computes the maximum cut, let's design one that computes an approximation (in particular, the algorithm will produce a cut that is at least half of the max cut). Since we just talked about the probabilistic method, let's try applying that.
The Randomized Baseline: Assume a completely random 2-coloring: independently, each vertex is colored Red with probability $$$1/2$$$ and Blue with probability $$$1/2$$$. What is the expected size of our cut?
Let $$$c_v \in \lbrace 0, 1\rbrace$$$ be the color of vertex $$$v$$$. For any edge $$$e = (u, v) \in E$$$, since both endpoints are colored independently at random, the edge will belong to the cut with probability $$$1/2$$$. Then, by linearity of expectation, the expected size of the cut is $$$\frac{|E|}{2}$$$.
So, the expected cut size is exactly half the total number of edges. Furthermore, because the absolute maximum cut $$$M$$$ obviously cannot exceed the total number of edges ($$$M \le |E|$$$), we know that $$$|E|/2 \ge M/2$$$.
This gives us our first algorithm: simply color everything randomly, and on average, we obtain a randomized 0.5-approximation.
But an "average" is not a strict guarantee, and we want algorithms that never fail. This is where we apply the Averaging Principle: since the expected score across all possible random colorings is $$$|E|/2$$$, it mathematically follows that there must exist at least one specific, deterministic coloring that achieves a cut of $$$\ge |E|/2$$$.
The question is: how do we find it efficiently? Let's derandomize the above algorithm.
Derandomization via Conditional Expectations
To remove the randomness, we will label our vertices $$$1, 2, \dots, n$$$ and assign them colors iteratively in that exact order.
Suppose we are currently at step $$$k$$$, meaning we have already permanently assigned colors to the first $$$k-1$$$ vertices. Let's call this fixed history of choices $$$H_{k-1} = (c_1, c_2, \dots, c_{k-1})$$$. We now need to decide the color $$$c_k \in \lbrace 0, 1 \rbrace$$$ for vertex $$$k$$$.
How do we make this choice? We want to ensure that our expected cut size never drops below our initial guarantee of $$$|E|/2$$$.
By applying Marginalization (The Law of Total Expectation) over the two possible colors we could assign to $$$k$$$, we can express our current expected cut size as the exact average of the two future branches:
By the Averaging Principle, it is impossible for both future branches to have an expected value strictly less than the current average. At least one of the choices ($$$c_k = 0$$$ or $$$c_k = 1$$$) must yield an expected cut size $$$\ge \mathbb{E}[\text{Cut Size} \mid H_{k-1}]$$$.
This gives us a deterministic strategy: At each step $$$k$$$, compute the expected cut size for both possible colors of vertex $$$k$$$, and greedily pick the color that gives the larger expectation. Inductively, if the expectation never decreases, our final deterministic coloring will have a cut size $$$\ge |E|/2$$$.
Evaluating the Conditional Probabilities
Computing the expected size of a cut over all possible future random combinations sounds like it would take exponential time. Let's expand the expectation mathematically to see how it simplifies.
By linearity of expectation, the expected cut size is the sum of the probabilities of each individual edge being cut:
To understand how choosing $$$c_k$$$ affects this sum, let's explicitly split the sum into two parts: edges that do not include vertex $$$k$$$, and edges that do include vertex $$$k$$$.
1. Edges not incident to $$$k$$$: Look at the first sum. The endpoints of these edges are either in our history (their colors are already fixed) or in the future (they will be assigned randomly later). Crucially, the probability of these edges being cut does not depend on $$$c_k$$$. Therefore, this entire first sum evaluates to the exact same constant $$$W$$$ regardless of whether we choose $$$c_k = 0$$$ or $$$c_k = 1$$$.
2. Edges incident to $$$k$$$: Now look at the second sum. These are the direct neighbors of vertex $$$k$$$. We can split these neighbors into two groups:
Backward Edges ($$$u \lt k$$$): Vertex $$$u$$$ has already been assigned a deterministic color $$$c_u$$$ in our history.
- If we set $$$c_k \neq c_u$$$, the edge is deterministically cut (probability $$$1$$$).
- If we set $$$c_k = c_u$$$, the edge is deterministically not cut (probability $$$0$$$).
Forward Edges ($$$u \gt k$$$): Vertex $$$u$$$ has not been colored yet. It will be assigned a random color later ($$$1/2$$$ probability for $$$0$$$, $$$1/2$$$ probability for $$$1$$$).
- If we set $$$c_k = 0$$$, the edge is cut if the future coin flip for $$$u$$$ lands on $$$1$$$. So, probability is $$$1/2$$$.
- If we set $$$c_k = 1$$$, the edge is cut if the future coin flip for $$$u$$$ lands on $$$0$$$. So, probability is $$$1/2$$$.
- Notice the beautiful collapse here: the probability a forward edge is cut is exactly $$$1/2$$$, completely regardless of which color we choose for vertex $$$k$$$.
The Algorithm
Let's put it all together. When we compare the expected score of choosing $$$c_k = 0$$$ versus $$$c_k = 1$$$:
The non-incident edges contribute the exact same constant $$$W$$$ to both sides.
The forward edges contribute exactly $$$1/2$$$ to both sides.
They perfectly cancel each other out in the inequality! The future literally marginalizes itself away. The condition:
completely reduces to: "Does setting $$$c_k = 0$$$ cut more backward edges than setting $$$c_k = 1$$$?"
Now we have a simple $$$\mathcal{O}(|V| + |E|)$$$ deterministic algorithm:
Iterate through the vertices $$$1, 2, \dots, n$$$.
For the current vertex $$$k$$$, count how many of its already-colored neighbors have color 0, and how many have color 1.
Assign $$$k$$$ the color opposite to the majority of its already-colored neighbors.
That's it. A purely greedy choice, guaranteeing a 0.5-approximation.
A Second Example: MAX Exact 3-SAT
Before we generalize this method, let's look at a different problem to see how versatile this technique really is.
The Problem: You are given a boolean formula consisting of $$$M$$$ clauses over $$$N$$$ variables ($$$x_1, x_2, \dots, x_N$$$). Each clause is an
ORof exactly 3 distinct literals (a literal is a variable $$$x_i$$$ or its negation $$$\neg x_i$$$). For example, a clause might look like $$$(x_1 \lor \neg x_4 \lor x_7)$$$. Find an assignment of True/False to the $$$N$$$ variables that maximizes the number of satisfied clauses. Let's denote the maximum possible number of satisfied clauses as $$$\text{OPT}$$$.
This optimization problem is also NP-hard. This is because ordinary 3-SAT (deciding if all clauses can be simultaneously satisfied) is a special case: it's just asking whether $$$\text{OPT}$$$ equals $$$M$$$. Therefore, we will look for an approximation algorithm. Specifically, we want an algorithm that guarantees we satisfy at least $$$\frac{7}{8}\text{OPT}$$$ clauses.
The Randomized Baseline: Let's assign each variable to True or False independently, with a $$$1/2$$$ probability for each.
When does a clause fail? A clause is an OR statement, so it only evaluates to False if all three of its literals evaluate to False. Since the variables are chosen independently and the 3 literals are distinct, the probability of a clause failing is exactly $$$\left(\frac{1}{2}\right)^3 = \frac{1}{8}$$$.
Therefore, the probability that any given clause is satisfied is $$$1 - \frac{1}{8} = \frac{7}{8}$$$. Once again, by linearity of expectation, the expected number of satisfied clauses is: $$$\mathbb{E}[\text{Satisfied}] = \frac{7}{8}M$$$.
Because the absolute optimal answer obviously cannot exceed the total number of clauses ($$$\text{OPT} \le M$$$), our expected score gives us a strict lower bound on the approximation ratio:
By the Averaging Principle, since the expected number of satisfied clauses is $$$\frac{7}{8}M$$$, there must exist at least one deterministic assignment that strictly satisfies $$$\ge \frac{7}{8}M$$$ clauses. Let's find it.
Derandomization: We will assign truth values to the variables sequentially: $$$x_1, x_2, \dots, x_N$$$. At step $$$k$$$, we have a history $$$H_{k-1}$$$ of fixed variables. We need to decide whether to set $$$x_k = \text{True}$$$ or $$$x_k = \text{False}$$$.
Just like in Max Cut, we want to maximize our conditional expectation. We need to evaluate the expected number of satisfied clauses given our choice:
Let's look at the state of any specific clause $$$C_j$$$ under our current partial assignment:
Already Satisfied: If at least one literal in $$$C_j$$$ evaluates to True under our already-fixed variables, the clause is permanently satisfied. Its conditional probability of being satisfied is exactly $$$1$$$.
Still Alive (but all fixed variables evaluated to False): Suppose $$$C_j$$$ has $$$u$$$ unassigned literals remaining (where $$$u \in \lbrace 1, 2, 3\rbrace$$$). The only way $$$C_j$$$ will eventually fail is if all $$$u$$$ of those future random coin flips land on the exact wrong boolean value. The probability it survives is therefore $$$1 - 2^{-u}$$$.
The Algorithm: Notice that, unlike Max Cut, the unassigned variables don't just "cancel out" completely, because their probabilities change depending on how many literals are left! But the greedy algorithm remains incredibly simple and totally deterministic.
At step $$$k$$$:
Pretend we set $$$x_k = \text{True}$$$. Iterate through all $$$M$$$ clauses. If a clause is already satisfied, add $$$1$$$ to our score. If it is not satisfied and has $$$u$$$ unassigned literals left, add $$$1 - 2^{-u}$$$ to our score. Let this total sum be $$$E_{\text{True}}$$$.
Pretend we set $$$x_k = \text{False}$$$. Do the exact same summation over all clauses to get $$$E_{\text{False}}$$$.
Greedily choose the boolean value that gave the larger score. Update the state of the clauses and move to $$$x_{k+1}$$$.
This runs in $$$\mathcal{\Theta}(N \cdot M)$$$ time, but it can be trivially optimized into $$$\mathcal{O}(N + M)$$$ time by only considering the clauses involving the variable $$$x_k$$$ at each step. As a result, this algorithm deterministically evaluates the exact conditional expectations at every step, guaranteeing we satisfy at least $$$\frac{7}{8}M$$$ clauses (which is a $$$\frac{7}{8}$$$-approximation of the optimal solution).
Note: As a nice connection, Max Cut is essentially equivalent to Max 2-SAT, so MAX 3-SAT can be seen as a natural generalization of the previous problem.
The Generalization: The Formal Framework
Now that we have seen two concrete examples, let's generalize the idea. The following is almost a copy-paste from a section of the 16th chapter of Alon and Spencer's textbook The Probabilistic Method (3rd edition).
Suppose we have a probability space $$$\Omega$$$ consisting of $$$2^l$$$ points. We can represent every point in this space as a binary vector of length $$$l$$$:
For simplicity, we assume this space is symmetric, meaning we generate a random point by choosing each bit $$$x_j$$$ independently with $$$\mathbb{P}(x_j = 0) = \mathbb{P}(x_j = 1) = \frac{1}{2}$$$.
Let there be a collection of $$$s$$$ events: $$$A_1, A_2, \dots, A_s$$$. By linearity of expectation, the expected number of events that hold true for a random vector is exactly the sum of their individual probabilities. Let's call this expected value $$$T$$$:
Because the expected number of valid events is exactly $$$T$$$, the Averaging Principle guarantees that there must exist:
At least one specific binary vector where $$$\ge T$$$ events hold.
At least one specific binary vector where $$$\le T$$$ events hold.
We want to find these vectors deterministically. Let's focus on constructing the first one (the $$$\ge T$$$ case), as the logic for the second one is completely symmetric.
The Sequential Construction: We will fix the bits of our vector sequentially: $$$x_1, x_2, \dots, x_l$$$.
Suppose we are currently at step $$$j$$$, and we have already fixed the first $$$j-1$$$ bits. Let's denote this history as $$$H_{j-1} = (x_1, x_2, \dots, x_{j-1})$$$. We now need to decide whether to set $$$x_j = 0$$$ or $$$x_j = 1$$$.
First, let's look at a single event $$$A_i$$$. By marginalizing over our two uniformly random choices for $$$x_j$$$, the current conditional probability of $$$A_i$$$ holding is the exact average of its conditional probabilities in the two future branches:
Summing this across all $$$s$$$ events gives us our global expected value. Because the $$$1/2$$$ is a constant, we can factor it completely out of the sum, decomposing our expected value into the average of two distinct sums:
Mathematically, an average can never be strictly greater than both of its components. It must always be less than or equal to the maximum of the two. Using the $$$E_0$$$ and $$$E_1$$$ defined above:
Therefore, if we greedily choose the bit that maximizes the sum (picking $$$x_j = 0$$$ if $$$E_0 \ge E_1$$$, and $$$x_j = 1$$$ otherwise), our expected number of satisfied events will never decrease:
At step 0, before making any choices, our expected score was exactly $$$T$$$. Because our greedy choice guarantees the expectation never decreases, at the final step $$$l$$$, our expected score must still be $$$\ge T$$$.
At step $$$l$$$, however, every single bit $$$x_1, \dots, x_l$$$ is completely fixed. There is no randomness left. The conditional expectation is simply the actual, deterministic number of events that hold for our constructed vector.
Thus, we have deterministically constructed a vector $$$(x_1, \dots, x_l)$$$ that satisfies at least $$$T$$$ events, doing so efficiently by simply evaluating and maximizing $$$\sum_{i=1}^s \mathbb{P}(A_i \mid H_{j-1}, x_j)$$$ at each step.
A Note on Asymmetric Spaces: What if our probability space is not symmetric? In some problems, we might want to set $$$x_j = 1$$$ with some arbitrary probability $$$p$$$, and $$$x_j = 0$$$ with probability $$$1-p$$$.
But the algorithm doesn't change at all. The marginalization step simply becomes a weighted average:
This weighted average still satisfies the same key inequality:
Therefore, even in highly asymmetric probability spaces, the greedy choice is exactly the same: evaluate $$$E_0$$$ and $$$E_1$$$, and pick the branch that yields the larger expectation.
Another Example: Monochromatic $$$K_4$$$ Subgraphs
Let $$$K_n$$$ denote the complete graph on $$$n$$$ vertices; for example: $$$K_4$$$, the complete graph on $$$4$$$ vertices, has $$$6$$$ edges. We are looking at 2-colorings of the edges of $$$K_n$$$, where every edge is colored either Red or Blue. We say that a $$$K_4$$$ subgraph is monochromatic if all 6 of its edges share the exact same color.
Theorem: For every integer $$$n$$$, there exists a 2-coloring of the edges of $$$K_n$$$ such that the total number of monochromatic copies of $$$K_4$$$ is at most $$$\binom{n}{4} \cdot 2^{-5}$$$.
Proof: Once again, let's consider a completely random 2-coloring of $$$K_n$$$. Each edge independently has a $$$1/2$$$ probability of being Red, and a $$$1/2$$$ probability of being Blue.
Let $$$S$$$ be a specific subset of 4 vertices. There are exactly $$$\binom{n}{4}$$$ such subsets in $$$K_n$$$. What is the probability that the $$$K_4$$$ formed by the vertices in $$$S$$$ is monochromatic? The $$$K_4$$$ has exactly $$$\binom{4}{2} = 6$$$ edges. It is monochromatic if all 6 edges are Red, OR if all 6 edges are Blue, meaning that it is monochromatic with probability $$$\frac{1}{2^6} + \frac{1}{2^6} = \frac{1}{2^5} = 2^{-5}$$$. By Linearity of Expectation, the expected total number of monochromatic copies of $$$K_4$$$ is the sum of the expected values of all these individual indicator variables:
By the Averaging Principle, since the average number of monochromatic copies of $$$K_4$$$ over all possible colorings is exactly $$$\binom{n}{4} 2^{-5}$$$, there must exist at least one specific deterministic 2-coloring where the number of monochromatic copies of $$$K_4$$$ is at most $$$\binom{n}{4} 2^{-5}$$$.
Derandomization: Since we have already established the general framework of Conditional Expectations, all we have to do is provide the implementation details. We will color the edges one by one. Because our goal is to bound the number of cliques from above, our greedy choice will simply pick the color that minimizes (rather than maximizes) the conditional expectation at each step.
To do this efficiently, we need to analyze how a single copy of $$$K_4$$$ contributes to the conditional average under a partial coloring.
For any specific $$$K_4$$$, its conditional probability of eventually becoming monochromatic falls into one of three cases:
Contribution = $$$0$$$: If the $$$K_4$$$ currently contains at least one Red edge AND at least one Blue edge, it is impossible for it to ever become monochromatic. Its probability is $$$0$$$.
Contribution = $$$2^{-5}$$$: If all 6 edges of the $$$K_4$$$ are currently uncolored, its probability of eventually becoming monochromatic is exactly $$$2^{-5}$$$, as we proved above.
The Partially Colored Case: Suppose the $$$K_4$$$ has exactly $$$t$$$ edges of the same color (where $$$t \ge 1$$$), and the remaining $$$6-t$$$ edges are uncolored. Since it already has $$$t$$$ edges of a specific color, it must become that exact color, which means all $$$6-t$$$ remaining edges must be that specific color, which happens with probability $$$\left(\frac{1}{2}\right)^{6-t} = 2^{t-6}$$$.
The total conditional expectation is simply the sum of these contributions over all $$$\binom{n}{4}$$$ copies of $$$K_4$$$!
When it is time to assign a color to the next edge $$$e_j$$$:
Pretend we color $$$e_j = \text{Red}$$$. We only need to look at the copies of $$$K_4$$$ that actually contain $$$e_j$$$. We calculate their new expected contributions using the three rules above, and sum them up to get $$$E_{\text{Red}}$$$.
Pretend we color $$$e_j = \text{Blue}$$$. We do the exact same calculation for the affected $$$K_4$$$s to get $$$E_{\text{Blue}}$$$.
Greedily assign $$$e_j$$$ the color that yields the smaller expected value: $$$\min(E_{\text{Red}}, E_{\text{Blue}})$$$.
Because the expectation mathematically can never increase when we take the minimum of the two branches, we are strictly guaranteed to finish with a deterministic coloring that contains $$$\le \binom{n}{4} 2^{-5}$$$ monochromatic copies of $$$K_4$$$.
We can implement this algorithm quite simply in $$$\mathcal{O}(n^4)$$$ time. We simply iterate through each of the $$$\binom{n}{2} = \mathcal{O}(n^2)$$$ edges, and for each one, we evaluate only the $$$K_4$$$ cliques that actually contain it. Since the current edge fixes two vertices, we just have to try all pairs of the remaining $$$n-2$$$ vertices to form a $$$K_4$$$. There are exactly $$$\binom{n-2}{2} = \mathcal{O}(n^2)$$$ such cliques per edge, and updating the expected value for each takes $$$\mathcal{O}(1)$$$ time. This gives us $$$\mathcal{O}(n^2)$$$ work per edge, resulting in a deterministic time complexity of $$$\mathcal{O}(n^4)$$$.
Conclusion
In this post, we explored an introduction to the probabilistic method, designing a randomized algorithm, and then making it completely deterministic by greedily navigating the expectation tree.
There are some other methods for derandomization, which might be covered in a future blogpost.
If you know of any Codeforces problems where this technique (or something similar) can be applied, I'd love to hear about them in the comments!
This is all for now. I'd appreciate any kind of comments or feedback.
Practice problems
Here are some problems to practice the ideas discussed in this blogpost. Feel free to discuss them in the comments.
1. Max Directed Cut
Given a directed graph $$$G=(V,E)$$$, partition the vertices into two sets $$$A$$$ and $$$B$$$ to maximize the number of edges going from $$$A$$$ to $$$B$$$.
This one is very similar to the undirected version. Again, we look for an approximation.
2. Balancing Vectors
Given $$$n$$$ vectors $$$v_1, v_2, \dots, v_n \in \mathbb{R}^d$$$ each of unit length $$$(|v_i| = 1)$$$, show that there exist signs $$$\varepsilon_1, \varepsilon_2, \dots, \varepsilon_n \in \lbrace -1, +1 \rbrace$$$ such that $$$\left|\sum \varepsilon_i v_i\right|^2 \ge n$$$, and find them efficiently.
3. Independent Set (Caro-Wei Bound)
Given a graph $$$G = (V, E)$$$, an independent set is a subset of $$$V$$$ such that no two vertices are adjacent. Find an independent set of size $$$\ge \sum_{v \in V} \frac{1}{d(v) + 1}$$$ where $$$d(v)$$$ is the degree of vertex $$$v$$$.
4. Unbalancing Lights
Given an $$$n \times n$$$ matrix $$$A$$$ with entries in $$$\lbrace -1, +1 \rbrace$$$, find a vector $$$x \in \lbrace -1, +1 \rbrace^n$$$ such that $$$\max_{i} |(Ax)_i| \ge \sqrt{n}$$$.
Try using the sum $$$\sum_i (Ax)_i^2 = |Ax|^2$$$ as a proxy objective instead, and show that this is enough.
References
[1] Noga Alon and Joel H. Spencer. The Probabilistic Method (3rd edition). Wiley-Interscience Series in Discrete Mathematics and Optimization, 2008. Chapter 16: Derandomization.








