Blog entries - Codeforces

aniervs's blog

[Tutorial] Derandomization via Conditional Expectations

By aniervs, 4 weeks ago, In English

Hello everyone!

In this blogpost, I want to talk about the probabilistic method in mathematics, how it can be used to obtain good randomized approximation algorithms for hard problems, and how we can use some derandomization techniques to turn these into deterministic algorithms that maintain the exact same guarantees.

Thanks to peltorator for proofreading this article and giving very useful feedback.

Prerequisites

To get the most out of this post, you should be familiar with a few foundational concepts from discrete probability:

Discrete Probability Spaces: A probability space consists of a sample space $$$\Omega$$$ (the set of all possible outcomes) and a probability measure $$$\mathbb{P}$$$. For every outcome $$$\omega \in \Omega$$$, the probability satisfies $$$\mathbb{P}(\omega) \ge 0$$$, and the sum over all outcomes is exactly $$$\sum_{\omega \in \Omega} \mathbb{P}(\omega) = 1$$$.
Events: An event $$$A$$$ is a subset of the sample space ($$$A \subseteq \Omega$$$). Its probability is the sum of the probabilities of its outcomes. If $$$\mathbb{P}(A) = 0$$$, the event is impossible (it never happens). If $$$\mathbb{P}(A) = 1$$$, the event is certain (it always happens).
Conditional Probability: The probability of event $$$A$$$ happening given that event $$$B$$$ has already happened. Mathematically, it is defined in terms of their intersection:

$$$\mathbb{P}(A \mid B) = \frac{\mathbb{P}(A \cap B)}{\mathbb{P}(B)}$$$

Independence: Two events are independent if the occurrence of one gives absolutely no information about the occurrence of the other. In terms of conditional probability, this means $$$\mathbb{P}(A \mid B) = \mathbb{P}(A)$$$. This definition gives us the familiar multiplication rule: $$$\mathbb{P}(A \cap B) = \mathbb{P}(A)\mathbb{P}(B)$$$.
Random Variables & Expectation: A random variable $$$X$$$ assigns a numerical value to each outcome. Its expected value (mean), $$$\mathbb{E}[X]$$$, is the sum of all possible values multiplied by their probabilities: $$$\mathbb{E}[X] = \sum x \cdot \mathbb{P}(X = x)$$$.
Marginalization (The Law of Total Expectation): This is how we mathematically deal with uncertainty. If our final score $$$X$$$ depends on some random variable $$$Y$$$ (like an unknown future coin flip), we can find the overall expected score by taking the weighted average across all possible values of $$$Y$$$.

$$$\mathbb{E}[X] = \sum_y \mathbb{E}[X \mid Y = y] \cdot \mathbb{P}(Y = y)$$$

Linearity of Expectation: This is the most important tool we will use. The expected value of a sum of random variables is the sum of their expected values: $$$\mathbb{E}[X + Y] = \mathbb{E}[X] + \mathbb{E}[Y]$$$. Crucially, this holds even if $$$X$$$ and $$$Y$$$ are dependent. We'll mostly use it in this blogpost with indicator variables. An indicator variable $$$I_A$$$ equals $$$1$$$ if event $$$A$$$ happens, and $$$0$$$ otherwise. Its expected value is simply the probability of the event $$$A$$$ happening: $$$\mathbb{E}[I_A] = 1 \cdot \mathbb{P}(A) + 0\cdot \mathbb{P}(\text{ not } A) = \mathbb{P}(A)$$$. Then, if we have some events $$$A_1, A_2, \dots, A_k$$$, and we want to find the expected number of events that hold, we have:

$$$\mathbb{E}[\text{hold}] = \mathbb{E}[\sum_{i = 1}^k I_{A_i}] = \sum_{i = 1}^k \mathbb{E}[I_{A_i}] = \sum_{i = 1}^k \mathbb{P}(A_i)$$$

The Averaging Principle: If a random variable is not just a fixed constant (meaning $$$\mathbb{P}(X = \mathbb{E}[X]) \lt 1$$$), then it mathematically must take values both above and below its average to balance out. Therefore, $$$\mathbb{P}(X \gt \mathbb{E}[X]) \gt 0$$$ and $$$\mathbb{P}(X \lt \mathbb{E}[X]) \gt 0$$$. In the context of our algorithms, this means if our expected score is $$$\mu$$$, there is definitely at least one valid configuration that scores above $$$\mu$$$, and at least one that scores below $$$\mu$$$.

The Probabilistic Method

In short, this method (pioneered by Paul Erdős) states: in order to prove that a structure with certain desired properties exists, one can define a probability space of all such structures and mathematically show that the desired properties hold in this space with a strictly positive probability. If the probability is $$$ \gt 0$$$, the structure must exist.

Motivating Problem: Max Cut

The Problem: Given an undirected graph $$$G = (V, E)$$$, a 2-coloring of its vertices is an assignment of a color (Red or Blue) to each vertex. Given any 2-coloring, we define a cut as the set of edges whose endpoints have different colors. We are interested in finding a 2-coloring that maximizes the cut size. Let's denote that maximum possible cut size as $$$M$$$.

This problem is actually NP-hard for general graphs, so instead of trying to design an algorithm that computes the maximum cut, let's design one that computes an approximation (in particular, the algorithm will produce a cut that is at least half of the max cut). Since we just talked about the probabilistic method, let's try applying that.

The Randomized Baseline: Assume a completely random 2-coloring: independently, each vertex is colored Red with probability $$$1/2$$$ and Blue with probability $$$1/2$$$. What is the expected size of our cut?

Let $$$c_v \in \lbrace 0, 1\rbrace$$$ be the color of vertex $$$v$$$. For any edge $$$e = (u, v) \in E$$$, since both endpoints are colored independently at random, the edge will belong to the cut with probability $$$1/2$$$. Then, by linearity of expectation, the expected size of the cut is $$$\frac{|E|}{2}$$$.

So, the expected cut size is exactly half the total number of edges. Furthermore, because the absolute maximum cut $$$M$$$ obviously cannot exceed the total number of edges ($$$M \le |E|$$$), we know that $$$|E|/2 \ge M/2$$$.

This gives us our first algorithm: simply color everything randomly, and on average, we obtain a randomized 0.5-approximation.

But an "average" is not a strict guarantee, and we want algorithms that never fail. This is where we apply the Averaging Principle: since the expected score across all possible random colorings is $$$|E|/2$$$, it mathematically follows that there must exist at least one specific, deterministic coloring that achieves a cut of $$$\ge |E|/2$$$.

The question is: how do we find it efficiently? Let's derandomize the above algorithm.

Derandomization via Conditional Expectations

To remove the randomness, we will label our vertices $$$1, 2, \dots, n$$$ and assign them colors iteratively in that exact order.

Suppose we are currently at step $$$k$$$, meaning we have already permanently assigned colors to the first $$$k-1$$$ vertices. Let's call this fixed history of choices $$$H_{k-1} = (c_1, c_2, \dots, c_{k-1})$$$. We now need to decide the color $$$c_k \in \lbrace 0, 1 \rbrace$$$ for vertex $$$k$$$.

How do we make this choice? We want to ensure that our expected cut size never drops below our initial guarantee of $$$|E|/2$$$.

By applying Marginalization (The Law of Total Expectation) over the two possible colors we could assign to $$$k$$$, we can express our current expected cut size as the exact average of the two future branches:

$$$\mathbb{E}[\text{Cut Size} \mid H_{k-1}] = \frac{1}{2} \mathbb{E}[\text{Cut Size} \mid H_{k-1}, c_k = 0] + \frac{1}{2} \mathbb{E}[\text{Cut Size} \mid H_{k-1}, c_k = 1]$$$

By the Averaging Principle, it is impossible for both future branches to have an expected value strictly less than the current average. At least one of the choices ($$$c_k = 0$$$ or $$$c_k = 1$$$) must yield an expected cut size $$$\ge \mathbb{E}[\text{Cut Size} \mid H_{k-1}]$$$.

This gives us a deterministic strategy: At each step $$$k$$$, compute the expected cut size for both possible colors of vertex $$$k$$$, and greedily pick the color that gives the larger expectation. Inductively, if the expectation never decreases, our final deterministic coloring will have a cut size $$$\ge |E|/2$$$.

Evaluating the Conditional Probabilities

Computing the expected size of a cut over all possible future random combinations sounds like it would take exponential time. Let's expand the expectation mathematically to see how it simplifies.

By linearity of expectation, the expected cut size is the sum of the probabilities of each individual edge being cut:

$$$\mathbb{E}[\text{Cut Size} \mid H_{k-1}, c_k] = \sum_{(u, v) \in E} \mathbb{P}(c_u \neq c_v \mid H_{k-1}, c_k)$$$

To understand how choosing $$$c_k$$$ affects this sum, let's explicitly split the sum into two parts: edges that do not include vertex $$$k$$$, and edges that do include vertex $$$k$$$.

$$$\mathbb{E}[\text{Cut Size} \mid H_{k-1}, c_k] = \underbrace{\displaystyle\sum_{e=(u,v) \in E, k \notin e} \mathbb{P}(c_u \neq c_v \mid H_{k-1}, c_k)}_{\text{Edges not incident to } k} + \underbrace{\displaystyle\sum_{u \in \text{adj}(k)} \mathbb{P}(c_k \neq c_u \mid H_{k-1}, c_k)}_{\text{Edges incident to } k}$$$

1. Edges not incident to $$$k$$$: Look at the first sum. The endpoints of these edges are either in our history (their colors are already fixed) or in the future (they will be assigned randomly later). Crucially, the probability of these edges being cut does not depend on $$$c_k$$$. Therefore, this entire first sum evaluates to the exact same constant $$$W$$$ regardless of whether we choose $$$c_k = 0$$$ or $$$c_k = 1$$$.

2. Edges incident to $$$k$$$: Now look at the second sum. These are the direct neighbors of vertex $$$k$$$. We can split these neighbors into two groups:

Backward Edges ($$$u \lt k$$$): Vertex $$$u$$$ has already been assigned a deterministic color $$$c_u$$$ in our history.
- If we set $$$c_k \neq c_u$$$, the edge is deterministically cut (probability $$$1$$$).
- If we set $$$c_k = c_u$$$, the edge is deterministically not cut (probability $$$0$$$).
Forward Edges ($$$u \gt k$$$): Vertex $$$u$$$ has not been colored yet. It will be assigned a random color later ($$$1/2$$$ probability for $$$0$$$, $$$1/2$$$ probability for $$$1$$$).
- If we set $$$c_k = 0$$$, the edge is cut if the future coin flip for $$$u$$$ lands on $$$1$$$. So, probability is $$$1/2$$$.
- If we set $$$c_k = 1$$$, the edge is cut if the future coin flip for $$$u$$$ lands on $$$0$$$. So, probability is $$$1/2$$$.
- Notice the beautiful collapse here: the probability a forward edge is cut is exactly $$$1/2$$$, completely regardless of which color we choose for vertex $$$k$$$.

The Algorithm

Let's put it all together. When we compare the expected score of choosing $$$c_k = 0$$$ versus $$$c_k = 1$$$:

The non-incident edges contribute the exact same constant $$$W$$$ to both sides.
The forward edges contribute exactly $$$1/2$$$ to both sides.

They perfectly cancel each other out in the inequality! The future literally marginalizes itself away. The condition:

$$$\mathbb{E}[\text{Cut Size} \mid H_{k-1}, c_k = 0] \ge \mathbb{E}[\text{Cut Size} \mid H_{k-1}, c_k = 1]$$$

completely reduces to: "Does setting $$$c_k = 0$$$ cut more backward edges than setting $$$c_k = 1$$$?"

Now we have a simple $$$\mathcal{O}(|V| + |E|)$$$ deterministic algorithm:

Iterate through the vertices $$$1, 2, \dots, n$$$.
For the current vertex $$$k$$$, count how many of its already-colored neighbors have color 0, and how many have color 1.
Assign $$$k$$$ the color opposite to the majority of its already-colored neighbors.

That's it. A purely greedy choice, guaranteeing a 0.5-approximation.

A Second Example: MAX Exact 3-SAT

Before we generalize this method, let's look at a different problem to see how versatile this technique really is.

The Problem: You are given a boolean formula consisting of $$$M$$$ clauses over $$$N$$$ variables ($$$x_1, x_2, \dots, x_N$$$). Each clause is an OR of exactly 3 distinct literals (a literal is a variable $$$x_i$$$ or its negation $$$\neg x_i$$$). For example, a clause might look like $$$(x_1 \lor \neg x_4 \lor x_7)$$$. Find an assignment of True/False to the $$$N$$$ variables that maximizes the number of satisfied clauses. Let's denote the maximum possible number of satisfied clauses as $$$\text{OPT}$$$.

This optimization problem is also NP-hard. This is because ordinary 3-SAT (deciding if all clauses can be simultaneously satisfied) is a special case: it's just asking whether $$$\text{OPT}$$$ equals $$$M$$$. Therefore, we will look for an approximation algorithm. Specifically, we want an algorithm that guarantees we satisfy at least $$$\frac{7}{8}\text{OPT}$$$ clauses.

The Randomized Baseline: Let's assign each variable to True or False independently, with a $$$1/2$$$ probability for each.

When does a clause fail? A clause is an OR statement, so it only evaluates to False if all three of its literals evaluate to False. Since the variables are chosen independently and the 3 literals are distinct, the probability of a clause failing is exactly $$$\left(\frac{1}{2}\right)^3 = \frac{1}{8}$$$.

Therefore, the probability that any given clause is satisfied is $$$1 - \frac{1}{8} = \frac{7}{8}$$$. Once again, by linearity of expectation, the expected number of satisfied clauses is: $$$\mathbb{E}[\text{Satisfied}] = \frac{7}{8}M$$$.

Because the absolute optimal answer obviously cannot exceed the total number of clauses ($$$\text{OPT} \le M$$$), our expected score gives us a strict lower bound on the approximation ratio:

$$$\mathbb{E}[\text{Satisfied}] = \frac{7}{8}M \ge \frac{7}{8}\text{OPT}$$$

By the Averaging Principle, since the expected number of satisfied clauses is $$$\frac{7}{8}M$$$, there must exist at least one deterministic assignment that strictly satisfies $$$\ge \frac{7}{8}M$$$ clauses. Let's find it.

Derandomization: We will assign truth values to the variables sequentially: $$$x_1, x_2, \dots, x_N$$$. At step $$$k$$$, we have a history $$$H_{k-1}$$$ of fixed variables. We need to decide whether to set $$$x_k = \text{True}$$$ or $$$x_k = \text{False}$$$.

Just like in Max Cut, we want to maximize our conditional expectation. We need to evaluate the expected number of satisfied clauses given our choice:

$$$\mathbb{E}[\text{Satisfied} \mid H_{k-1}, x_k] = \sum_{j=1}^M \mathbb{P}(C_j \text{ is satisfied} \mid H_{k-1}, x_k)$$$

Let's look at the state of any specific clause $$$C_j$$$ under our current partial assignment:

Already Satisfied: If at least one literal in $$$C_j$$$ evaluates to True under our already-fixed variables, the clause is permanently satisfied. Its conditional probability of being satisfied is exactly $$$1$$$.
Still Alive (but all fixed variables evaluated to False): Suppose $$$C_j$$$ has $$$u$$$ unassigned literals remaining (where $$$u \in \lbrace 1, 2, 3\rbrace$$$). The only way $$$C_j$$$ will eventually fail is if all $$$u$$$ of those future random coin flips land on the exact wrong boolean value. The probability it survives is therefore $$$1 - 2^{-u}$$$.

The Algorithm: Notice that, unlike Max Cut, the unassigned variables don't just "cancel out" completely, because their probabilities change depending on how many literals are left! But the greedy algorithm remains incredibly simple and totally deterministic.

At step $$$k$$$:

Pretend we set $$$x_k = \text{True}$$$. Iterate through all $$$M$$$ clauses. If a clause is already satisfied, add $$$1$$$ to our score. If it is not satisfied and has $$$u$$$ unassigned literals left, add $$$1 - 2^{-u}$$$ to our score. Let this total sum be $$$E_{\text{True}}$$$.
Pretend we set $$$x_k = \text{False}$$$. Do the exact same summation over all clauses to get $$$E_{\text{False}}$$$.
Greedily choose the boolean value that gave the larger score. Update the state of the clauses and move to $$$x_{k+1}$$$.

This runs in $$$\mathcal{\Theta}(N \cdot M)$$$ time, but it can be trivially optimized into $$$\mathcal{O}(N + M)$$$ time by only considering the clauses involving the variable $$$x_k$$$ at each step. As a result, this algorithm deterministically evaluates the exact conditional expectations at every step, guaranteeing we satisfy at least $$$\frac{7}{8}M$$$ clauses (which is a $$$\frac{7}{8}$$$-approximation of the optimal solution).

Note: As a nice connection, Max Cut is essentially equivalent to Max 2-SAT, so MAX 3-SAT can be seen as a natural generalization of the previous problem.

The Generalization: The Formal Framework

Now that we have seen two concrete examples, let's generalize the idea. The following is almost a copy-paste from a section of the 16th chapter of Alon and Spencer's textbook The Probabilistic Method (3rd edition).

Suppose we have a probability space $$$\Omega$$$ consisting of $$$2^l$$$ points. We can represent every point in this space as a binary vector of length $$$l$$$:

$$$x = (x_1, x_2, \dots, x_l) \quad \text{where } x_j \in \lbrace 0, 1\rbrace$$$

For simplicity, we assume this space is symmetric, meaning we generate a random point by choosing each bit $$$x_j$$$ independently with $$$\mathbb{P}(x_j = 0) = \mathbb{P}(x_j = 1) = \frac{1}{2}$$$.

Let there be a collection of $$$s$$$ events: $$$A_1, A_2, \dots, A_s$$$. By linearity of expectation, the expected number of events that hold true for a random vector is exactly the sum of their individual probabilities. Let's call this expected value $$$T$$$:

$$$T = \mathbb{E}[\text{Number of events that hold}] = \sum_{i=1}^s \mathbb{P}(A_i)$$$

Because the expected number of valid events is exactly $$$T$$$, the Averaging Principle guarantees that there must exist:

At least one specific binary vector where $$$\ge T$$$ events hold.
At least one specific binary vector where $$$\le T$$$ events hold.

We want to find these vectors deterministically. Let's focus on constructing the first one (the $$$\ge T$$$ case), as the logic for the second one is completely symmetric.

The Sequential Construction: We will fix the bits of our vector sequentially: $$$x_1, x_2, \dots, x_l$$$.

Suppose we are currently at step $$$j$$$, and we have already fixed the first $$$j-1$$$ bits. Let's denote this history as $$$H_{j-1} = (x_1, x_2, \dots, x_{j-1})$$$. We now need to decide whether to set $$$x_j = 0$$$ or $$$x_j = 1$$$.

First, let's look at a single event $$$A_i$$$. By marginalizing over our two uniformly random choices for $$$x_j$$$, the current conditional probability of $$$A_i$$$ holding is the exact average of its conditional probabilities in the two future branches:

$$$\mathbb{P}(A_i \mid H_{j-1}) = \frac{1}{2}\mathbb{P}(A_i \mid H_{j-1}, x_j = 0) + \frac{1}{2}\mathbb{P}(A_i \mid H_{j-1}, x_j = 1)$$$

Summing this across all $$$s$$$ events gives us our global expected value. Because the $$$1/2$$$ is a constant, we can factor it completely out of the sum, decomposing our expected value into the average of two distinct sums:

$$$\mathbb{E}[\text{Events} \mid H_{j-1}] = \frac{1}{2} \underbrace{\left( \sum_{i=1}^s \mathbb{P}(A_i \mid H_{j-1}, x_j = 0) \right)}_{E_0} + \frac{1}{2} \underbrace{\left( \sum_{i=1}^s \mathbb{P}(A_i \mid H_{j-1}, x_j = 1) \right)}_{E_1}$$$

Mathematically, an average can never be strictly greater than both of its components. It must always be less than or equal to the maximum of the two. Using the $$$E_0$$$ and $$$E_1$$$ defined above:

$$$\mathbb{E}[\text{Events} \mid H_{j-1}] = \frac{1}{2}E_0 + \frac{1}{2}E_1 \le \max(E_0, E_1)$$$

Therefore, if we greedily choose the bit that maximizes the sum (picking $$$x_j = 0$$$ if $$$E_0 \ge E_1$$$, and $$$x_j = 1$$$ otherwise), our expected number of satisfied events will never decrease:

$$$\mathbb{E}[\text{Events} \mid H_j] \ge \mathbb{E}[\text{Events} \mid H_{j-1}]$$$

At step 0, before making any choices, our expected score was exactly $$$T$$$. Because our greedy choice guarantees the expectation never decreases, at the final step $$$l$$$, our expected score must still be $$$\ge T$$$.

At step $$$l$$$, however, every single bit $$$x_1, \dots, x_l$$$ is completely fixed. There is no randomness left. The conditional expectation is simply the actual, deterministic number of events that hold for our constructed vector.

Thus, we have deterministically constructed a vector $$$(x_1, \dots, x_l)$$$ that satisfies at least $$$T$$$ events, doing so efficiently by simply evaluating and maximizing $$$\sum_{i=1}^s \mathbb{P}(A_i \mid H_{j-1}, x_j)$$$ at each step.

A Note on Asymmetric Spaces: What if our probability space is not symmetric? In some problems, we might want to set $$$x_j = 1$$$ with some arbitrary probability $$$p$$$, and $$$x_j = 0$$$ with probability $$$1-p$$$.

But the algorithm doesn't change at all. The marginalization step simply becomes a weighted average:

$$$\mathbb{E}[\text{Events} \mid H_{j-1}] = (1-p)E_0 + p E_1$$$

This weighted average still satisfies the same key inequality:

$$$(1-p)E_0 + p E_1 \le \max(E_0, E_1)$$$

Therefore, even in highly asymmetric probability spaces, the greedy choice is exactly the same: evaluate $$$E_0$$$ and $$$E_1$$$, and pick the branch that yields the larger expectation.

Another Example: Monochromatic $$$K_4$$$ Subgraphs

Let $$$K_n$$$ denote the complete graph on $$$n$$$ vertices; for example: $$$K_4$$$, the complete graph on $$$4$$$ vertices, has $$$6$$$ edges. We are looking at 2-colorings of the edges of $$$K_n$$$, where every edge is colored either Red or Blue. We say that a $$$K_4$$$ subgraph is monochromatic if all 6 of its edges share the exact same color.

Theorem: For every integer $$$n$$$, there exists a 2-coloring of the edges of $$$K_n$$$ such that the total number of monochromatic copies of $$$K_4$$$ is at most $$$\binom{n}{4} \cdot 2^{-5}$$$.

Proof: Once again, let's consider a completely random 2-coloring of $$$K_n$$$. Each edge independently has a $$$1/2$$$ probability of being Red, and a $$$1/2$$$ probability of being Blue.

Let $$$S$$$ be a specific subset of 4 vertices. There are exactly $$$\binom{n}{4}$$$ such subsets in $$$K_n$$$. What is the probability that the $$$K_4$$$ formed by the vertices in $$$S$$$ is monochromatic? The $$$K_4$$$ has exactly $$$\binom{4}{2} = 6$$$ edges. It is monochromatic if all 6 edges are Red, OR if all 6 edges are Blue, meaning that it is monochromatic with probability $$$\frac{1}{2^6} + \frac{1}{2^6} = \frac{1}{2^5} = 2^{-5}$$$. By Linearity of Expectation, the expected total number of monochromatic copies of $$$K_4$$$ is the sum of the expected values of all these individual indicator variables:

$$$\mathbb{E}[\text{Monochromatic copies of } K_4] =\binom{n}{4} 2^{-5}$$$

By the Averaging Principle, since the average number of monochromatic copies of $$$K_4$$$ over all possible colorings is exactly $$$\binom{n}{4} 2^{-5}$$$, there must exist at least one specific deterministic 2-coloring where the number of monochromatic copies of $$$K_4$$$ is at most $$$\binom{n}{4} 2^{-5}$$$.

Derandomization: Since we have already established the general framework of Conditional Expectations, all we have to do is provide the implementation details. We will color the edges one by one. Because our goal is to bound the number of cliques from above, our greedy choice will simply pick the color that minimizes (rather than maximizes) the conditional expectation at each step.

To do this efficiently, we need to analyze how a single copy of $$$K_4$$$ contributes to the conditional average under a partial coloring.

For any specific $$$K_4$$$, its conditional probability of eventually becoming monochromatic falls into one of three cases:

Contribution = $$$0$$$: If the $$$K_4$$$ currently contains at least one Red edge AND at least one Blue edge, it is impossible for it to ever become monochromatic. Its probability is $$$0$$$.
Contribution = $$$2^{-5}$$$: If all 6 edges of the $$$K_4$$$ are currently uncolored, its probability of eventually becoming monochromatic is exactly $$$2^{-5}$$$, as we proved above.
The Partially Colored Case: Suppose the $$$K_4$$$ has exactly $$$t$$$ edges of the same color (where $$$t \ge 1$$$), and the remaining $$$6-t$$$ edges are uncolored. Since it already has $$$t$$$ edges of a specific color, it must become that exact color, which means all $$$6-t$$$ remaining edges must be that specific color, which happens with probability $$$\left(\frac{1}{2}\right)^{6-t} = 2^{t-6}$$$.

The total conditional expectation is simply the sum of these contributions over all $$$\binom{n}{4}$$$ copies of $$$K_4$$$!

When it is time to assign a color to the next edge $$$e_j$$$:

Pretend we color $$$e_j = \text{Red}$$$. We only need to look at the copies of $$$K_4$$$ that actually contain $$$e_j$$$. We calculate their new expected contributions using the three rules above, and sum them up to get $$$E_{\text{Red}}$$$.
Pretend we color $$$e_j = \text{Blue}$$$. We do the exact same calculation for the affected $$$K_4$$$s to get $$$E_{\text{Blue}}$$$.
Greedily assign $$$e_j$$$ the color that yields the smaller expected value: $$$\min(E_{\text{Red}}, E_{\text{Blue}})$$$.

Because the expectation mathematically can never increase when we take the minimum of the two branches, we are strictly guaranteed to finish with a deterministic coloring that contains $$$\le \binom{n}{4} 2^{-5}$$$ monochromatic copies of $$$K_4$$$.

We can implement this algorithm quite simply in $$$\mathcal{O}(n^4)$$$ time. We simply iterate through each of the $$$\binom{n}{2} = \mathcal{O}(n^2)$$$ edges, and for each one, we evaluate only the $$$K_4$$$ cliques that actually contain it. Since the current edge fixes two vertices, we just have to try all pairs of the remaining $$$n-2$$$ vertices to form a $$$K_4$$$. There are exactly $$$\binom{n-2}{2} = \mathcal{O}(n^2)$$$ such cliques per edge, and updating the expected value for each takes $$$\mathcal{O}(1)$$$ time. This gives us $$$\mathcal{O}(n^2)$$$ work per edge, resulting in a deterministic time complexity of $$$\mathcal{O}(n^4)$$$.

Conclusion

In this post, we explored an introduction to the probabilistic method, designing a randomized algorithm, and then making it completely deterministic by greedily navigating the expectation tree.

There are some other methods for derandomization, which might be covered in a future blogpost.

If you know of any Codeforces problems where this technique (or something similar) can be applied, I'd love to hear about them in the comments!

This is all for now. I'd appreciate any kind of comments or feedback.

Practice problems

Here are some problems to practice the ideas discussed in this blogpost. Feel free to discuss them in the comments.

1. Max Directed Cut

Given a directed graph $$$G=(V,E)$$$, partition the vertices into two sets $$$A$$$ and $$$B$$$ to maximize the number of edges going from $$$A$$$ to $$$B$$$.

This one is very similar to the undirected version. Again, we look for an approximation.

2. Balancing Vectors

Given $$$n$$$ vectors $$$v_1, v_2, \dots, v_n \in \mathbb{R}^d$$$ each of unit length $$$(|v_i| = 1)$$$, show that there exist signs $$$\varepsilon_1, \varepsilon_2, \dots, \varepsilon_n \in \lbrace -1, +1 \rbrace$$$ such that $$$\left|\sum \varepsilon_i v_i\right|^2 \ge n$$$, and find them efficiently.

3. Independent Set (Caro-Wei Bound)

Given a graph $$$G = (V, E)$$$, an independent set is a subset of $$$V$$$ such that no two vertices are adjacent. Find an independent set of size $$$\ge \sum_{v \in V} \frac{1}{d(v) + 1}$$$ where $$$d(v)$$$ is the degree of vertex $$$v$$$.

4. Unbalancing Lights

Given an $$$n \times n$$$ matrix $$$A$$$ with entries in $$$\lbrace -1, +1 \rbrace$$$, find a vector $$$x \in \lbrace -1, +1 \rbrace^n$$$ such that $$$\max_{i} |(Ax)_i| \ge \sqrt{n}$$$.

Try using the sum $$$\sum_i (Ax)_i^2 = |Ax|^2$$$ as a proxy objective instead, and show that this is enough.

References

[1] Noga Alon and Joel H. Spencer. The Probabilistic Method (3rd edition). Wiley-Interscience Series in Discrete Mathematics and Optimization, 2008. Chapter 16: Derandomization.

Full text and comments »

randomization, combinatorics, erdos, approximation

+179

aniervs
4 weeks ago
5

Tutorial: Solving Linear Recurrences using Field Extensions

By aniervs, 3 months ago, In English

If you have ever encountered a problem asking for the sum of powers of Fibonacci numbers or other linear recurrences, you might have resorted to Matrix Powering; or techniques involving Berlekamp-Massey's, Fiduccia's, and Bostan-Mori's algorithms.

By understanding the underlying algebra of the recurrence roots, we can often solve these problems quite efficiently. We are going to work directly in a Field Extension, like $$$\mathbb{Q}(\sqrt{5})$$$ or its modular counterpart $$$\mathbb{Z}_p[\sqrt{5}]$$$.

So, let's start with a motivating problem, and we'll get to field extensions by the end of the solution.

1. Motivating problem

We have the fibonacci sequence $$$(f_n)_{n \ge 0}$$$ defined by: $$$f_0 = 0, f_1 = 1, f_{n + 2} = f_{n + 1} + f_n (n \ge 0)$$$. Given $$$n, k (n \le 10^{12}, k \le 2\cdot 10^5)$$$ and a big prime modulo $$$M$$$ (for e.g $$$M = 10^9+7$$$), find

$$$S_k(n) = f_0^k + f_1^k + f_2^k + \dots f_n^k \pmod M$$$

2. Using Binet's formula

Binet's formula tells us $$$f_n = \frac{1}{\sqrt{5}} \phi^n - \frac{1}{\sqrt{5}}\psi^n$$$, where $$$\phi = \frac{1 + \sqrt{5}}{2}$$$ and $$$\psi = \frac{1 - \sqrt{5}}{2}$$$. We can usually arrive to such formula through the generating function of $$$(f_n)$$$ or through the diagonalization of the recurrence's matrix, etc.

Thus,

$$$f_n^k = \left(\frac{1}{\sqrt{5}} \phi^n - \frac{1}{\sqrt{5}}\psi^n\right)^k = \frac{1}{5^{k / 2}}\displaystyle\sum_{l = 0}^k \binom{k}{l}(\phi^{l}\psi^{k - l})^n$$$

Now, we add over $$$n$$$, and we get:

$$$S_k(N) = \displaystyle\sum_{n = 0}^N f_n^k = \displaystyle\sum_{n = 0}^N \frac{1}{5^{k / 2}}\displaystyle\sum_{l = 0}^k \binom{k}{l}(\phi^{l}\psi^{k - l})^n$$$

Let $$$a_l = \phi^{l}\psi^{k - l}$$$. Then, we have:

$$$S_k(N) = \frac{1}{5^{k / 2}} \displaystyle\sum_{l = 0}^k \binom{k}{l}\displaystyle\sum_{n = 0}^Na_l^n$$$

This inner sum is a geometric series. If $$$a_l \neq 1$$$, we have:

$$$\displaystyle\sum_{n = 0}^Na_l^n = \frac{a_l^{N + 1} - 1}{a_l - 1}$$$

If $$$a_l = 1$$$, the sum is simply $$$(N+1) \pmod M$$$.

The formula above is nice, because it allows to compute $$$S_k(N) \mod M$$$ whenever $$$5$$$ is a quadratic residue of $$$M$$$ (when the discrete root of $$$5$$$ exists). To check that, we can use Quadratic reciprocity, but that's not the point of this article, hence I omit it. Anyways, ... for $$$M = 10^9 + 7$$$, there's no $$$x: x^2 \equiv 5 \mod M$$$, which means our formula is problematic because we don't know how to deal with $$$\sqrt{5}$$$. That's where the field extensions come!

3. What is a Field Extension?

You can read this if you want something more formal, or alternatively, one of the early chapters in any abstract algebra book.

Here is a non-formal introduction, but with enough rigor for our purposes.

You are likely already familiar with the most famous field extension: Complex Numbers. To create the Complex Numbers $$$\mathbb{C}$$$, we take the Real Numbers $$$\mathbb{R}$$$ and append a new element $$$i$$$ such that $$$i^2 = -1$$$. Every number in this new field looks like:

$$$z = a + b \cdot i$$$

where $$$a$$$ and $$$b$$$ are real numbers.

We can do the exact same thing for square roots of integers that aren't perfect squares. For Fibonacci numbers, the magic number is $$$\sqrt{5}$$$. We define the extension $$$\mathbb{Z}_M[\sqrt{5}]$$$ as the set of numbers of the form:

$$$z = a + b\sqrt{5}$$$

where $$$a$$$ and $$$b$$$ are integers modulo $$$M$$$.

The Arithmetic Operations

Operations in this set are very similar to complex numbers, just replacing $$$i^2 = -1$$$ with $$$(\sqrt{5})^2 = 5$$$.

Let $$$Z_1 = a + b\sqrt{5}$$$ and $$$Z_2 = c + d\sqrt{5}$$$.

1. Addition: Simply add the components:

$$$Z_1 + Z_2 = (a + c) + (b + d)\sqrt{5}$$$

2. Multiplication:

$$$(a + b\sqrt{5})(c + d\sqrt{5}) = ac + ad\sqrt{5} + bc\sqrt{5} + bd(\sqrt{5})^2$$$

Since $$$(\sqrt{5})^2 = 5$$$, this simplifies to:

$$$Z_1 \cdot Z_2 = (ac + 5bd) + (ad + bc)\sqrt{5}$$$

3. Division (Modular Inverse): To divide (or find the inverse), we use the conjugate. The conjugate of $$$z = a + b\sqrt{5}$$$ is $$$\bar{z} = a - b\sqrt{5}$$$.

$$$\frac{1}{a + b\sqrt{5}} = \frac{a - b\sqrt{5}}{(a + b\sqrt{5})(a - b\sqrt{5})} = \frac{a - b\sqrt{5}}{a^2 - 5b^2}$$$

The denominator $$$a^2 - 5b^2$$$ is an integer modulo $$$M$$$. Since we are assuming $$$M$$$ is a prime where 5 is a quadratic non-residue, $$$a^2 - 5b^2 \equiv 0 \implies a \equiv b \equiv 0$$$. Thus, for any non-zero element, the inverse always exists.

4. Implementation in C++

Here is a struct Num that implements arithmetic in $$$\mathbb{Z}_M[\sqrt{\omega}]$$$. Note: For Fibonacci, set omega = 5. If working modulo $$$10^9+7$$$, inv uses Fermat's Little Theorem.

Code

#include <bits/stdc++.h>
using namespace std;

using ll = long long;

const int MOD = 1e9 + 7;
const int OMEGA = 5; // We are working in Z[sqrt(5)]

struct Num {
    ll x, y; // Represents x + y * sqrt(OMEGA)

    Num(ll _x = 0, ll _y = 0) {
        x = _x % MOD;
        y = _y % MOD;
        if (x < 0) x += MOD;
        if (y < 0) y += MOD;
    }

    Num operator+(const Num &other) const {
        return Num(x + other.x, y + other.y);
    }

    Num operator-(const Num &other) const {
        return Num(x - other.x, y - other.y);
    }

    Num operator*(const Num &other) const {
        ll nx = (x * other.x + y * other.y % MOD * OMEGA) % MOD;
        ll ny = (x * other.y + y * other.x) % MOD;
        return Num(nx, ny);
    }
    
    bool operator==(const Num &other) const {
        return x == other.x && y == other.y;
    }
};

ll power(ll base, ll exp) {
    ll res = 1;
    base %= MOD;
    while (exp > 0) {
        if (exp % 2 == 1) res = (res * base) % MOD;
        base = (base * base) % MOD;
        exp /= 2;
    }
    return res;
}

ll modInverse(ll n) {
    return power(n, MOD - 2);
}

// Inverse of a field element
// 1 / (x + y*sq) = (x - y*sq) / (x^2 - w*y^2)
Num inverse(Num n) {
    ll num_norm = (n.x * n.x - n.y * n.y % MOD * OMEGA) % MOD;
    if (num_norm < 0) num_norm += MOD;
    
    ll inv_norm = modInverse(num_norm);
    
    // Result is (x * inv_norm) - (y * inv_norm) * sqrt(w)
    return Num(n.x * inv_norm % MOD, (MOD - n.y) * inv_norm % MOD);
}

Num power(Num base, ll exp) {
    Num res(1, 0); // Result starts as 1 + 0*sqrt(5)
    while (exp > 0) {
        if (exp % 2 == 1) res = res * base;
        base = base * base;
        exp /= 2;
    }
    return res;
}

5. Back to the problem

We have $$$\phi = \frac{1 + \sqrt{5}}{2} = 2^{-1} + 2^{-1}\cdot \sqrt{5} \in \mathbb{Z}_{M}[\sqrt{5}]$$$ and $$$\psi = 2^{-1} - 2^{-1}\cdot \sqrt{5} \in \mathbb{Z}_{M}[\sqrt{5}]$$$. Now, we can compute all values $$$(a_l)_{l \le k}$$$ in $$$O(k)$$$ time.

We iterate $$$l$$$ from $$$0$$$ to $$$k$$$:

Compute $$$a_l = \phi^l \psi^{k-l}$$$ using the Num struct.
If $$$a_l = 1$$$, add $$$\binom{k}{l} \cdot (N+1)$$$ to the total.
If $$$a_l \neq 1$$$, add $$$\binom{k}{l} \cdot \frac{a_l^{N+1}-1}{a_l-1}$$$ to the total.

Since powering takes $$$O(\log N)$$$ and we do it $$$k$$$ times, we compute $$$S_k(N)$$$ in $$$O(k \log N)$$$ time. This is a massive improvement over the standard $$$O(k^3 \log N)$$$ matrix powering!

6. Generalization

This technique isn't limited to Fibonacci numbers. It works for any linear recurrence relation of order 2: $$$u_{n} = P u_{n-1} - Q u_{n-2}$$$. The roots of the characteristic equation $$$x^2 - Px + Q = 0$$$ are given by $$$\frac{P \pm \sqrt{\Delta}}{2}$$$ where $$$\Delta = P^2 - 4Q$$$.

If $$$\Delta$$$ is not a perfect square modulo $$$M$$$, you can simply work in the field extension $$$\mathbb{Z}_M[\sqrt{\Delta}]$$$ by setting OMEGA to $$$\Delta$$$ in the code above. The rest of the logic remains exactly the same.

Actually, it also works for higher-order recurrences if we are able to find the roots of the characteristic polynomial. However, for higher orders, we need to be careful with our Field Extension structure. For instance, the field extension generated by $$$\sqrt[3]{5}$$$ is not as simple as the set $$$\{a + b\sqrt[3]{5} \mid a,b \in \mathbb{Z}_M\}$$$. Since $$$\sqrt[3]{5} \cdot \sqrt[3]{5} = \sqrt[3]{25}$$$, we need a basis of size 3 ($$$1, \sqrt[3]{5}, \sqrt[3]{25}$$$). The details are left as an exercise to the reader.

7. Practice Problems

Here are some problems where this technique can be applied:

SPOJ: FIBPWSUM
Codeforces: 717A
Szkopuł: Leonardo's Numbers. This problem appears in the book Looking for a challenge. The best solution explained there, through matrix power, takes $$$O(k^3 \log n)$$$ time, and it could be improved with the ideas from this blog, but unfortunately the modulo being $$$10^9$$$ makes it quite hard (perhaps impossible) to solve it via field extensions.

Full text and comments »

recurrences, field extensions, algebra, combinatorics

aniervs
3 months ago
0

Convolution present on 1^k + 2^k + ... + n^k

By aniervs, history, 21 month(s) ago, In English

The following is a very well-known problem, yet I recently found a nice solution involving convolutions.

You're given $$$K$$$ and $$$n$$$ and you must compute for all $$$k \in [0, K]$$$

$$$S_k = 1^k + 2^k + \dots + n^k$$$

Obviously, we want it $$$mod$$$ some $$$M$$$, but let's ignore it for now.

There're many ways of solving this problem, but here I focus on a particular one:

We start with some standard stuff, by expanding $$$\sum_{i=1}^n (i+1)^{k + 1}$$$. Through the Binomial Theorem we get:

$$$\displaystyle\sum_{i=1}^n (i+1)^{k + 1} = \sum_{i=1}^n \sum_{j=0}^{k+1} \binom{k+1}{j} i^j = \sum_{j=0}^{k+1} \binom{k+1}{j} \sum_{i=1}^n i^j = \sum_{i=1}^n i^{k + 1} + \binom{k + 1}{k} S_{k}+ \sum_{j=0}^{k-1} \binom{k+1}{j} S_j$$$

Then, we can actually cancel out the LHS via some telescopic sum:

$$$\displaystyle\sum_{i=1}^n (i+1)^{k + 1} = \sum_{i=1}^n i^{k + 1} + \binom{k + 1}{k} S_{k}+ \sum_{j=0}^{k-1} \binom{k+1}{j} S_j$$$ $$$\displaystyle\sum_{i=1}^n (i+1)^{k + 1} - \sum_{i=1}^n i^{k + 1} = \binom{k + 1}{k} S_{k}+ \sum_{j=0}^{k-1} \binom{k+1}{j} S_j$$$ $$$\displaystyle (n+1)^{k + 1} - 1 = (k + 1) S_{k}+ \sum_{j=0}^{k-1} \binom{k+1}{j} S_j$$$

From there, we get a recurrence:

$$$\displaystyle S_{k} = \frac{1}{k + 1} \left[(n+1)^{k + 1} - 1 - \sum_{j=0}^{k-1} \binom{k+1}{j} S_j \right]$$$

Now, let's expand the binomial coefficients and we get:

$$$\displaystyle S_{k} = \frac{1}{k + 1} \left[(n+1)^{k + 1} - 1 - (k+1)! \cdot \sum_{j=0}^{k-1} \frac{1}{(k+1-j)!} \cdot \frac{1}{j!} S_j \right]$$$

Now, let's define for each $$$t$$$: $$$A_{t}$$$ as $$$\frac{1}{(t+1)!}$$$ and $$$B_{t}$$$ as $$$\frac{1}{t!} S_t$$$. Notice that now the inner sum in the recurrence it's almost a convolution:

We have pairs $$$A_{i}, B_{j}$$$, and the sum of the products $$$A_i \cdot B_j$$$ is contributing to $$$S_{i+j}$$$. The main issue is that we can't do a single convolution between $$$A$$$ and $$$B$$$ in order to compute $$$S$$$, because we need $$$S$$$ to define $$$B$$$.

The alternative is to do some Divide and Conquer (thanks to Errichto who taught me this trick 2 years ago or more):

Let's say we want to compute $$$S_k$$$ for every $$$k \in [l, r]$$$. Let's say $$$p=\lfloor \frac{l+r}{2} \rfloor$$$. We'll have a recursive algorithm:

The idea is to first compute $$$S_k$$$ for each $$$k \in [l, p]$$$ recursively. Then to compute $$$\mathrm{B}[]$$$ for those values of $$$k$$$ and to apply the convolution. Then to update $$$S_k$$$ for all $$$k \in [p+1, r]$$$. Lastly, solve recursively for the other half ($$$[p+1,r]$$$).

Something like this, in a very Pythonic style:

def solve(l, r):
    if l == r:
        # base case, add the parts not related to the convolution
        # use proper modular arithmetics 
        # ...
        return
    mid = (l + r) // 2
    solve(l, mid)
    convolve(A[1...r-l], B[l...mid])
    update S[mid+1...r] with the results from the convolution
    solve(mid + 1, r)

The final answer is computed by calling solve(0, K). Given that the sequences being convolved are of size $$$\mathcal{O}(r - l)$$$, we can convolve them in $$$\mathcal{O}((r - l)\log (r - l))$$$ time via FFT, giving a total time complexity of $$$\mathcal{O}(K\log^2 K)$$$.

As for modular arithmetics, notice we need the multiplicative inverse of all numbers $$$j \in [1, K+1]$$$, hence the modulo better be good (a big prime and also FFT-friendly).

That's all, I hope you like it. There are other ways of computing $$$S$$$, some of them quite interesting as well. I recommend going through them too.

Full text and comments »

convolutions, math, divide and conquer

+133

aniervs
21 month(s) ago
8

About AtCoder Contests Discussion

By aniervs, history, 5 years ago, In English

Hello. Recently, I started participating on AtCoder contests, and in many cases, I don't see announcements on codeforces, or any posts for discussing the problems. I remember that before, it was very frequent to see at least one AtCoder Contest-related post among the recent actions.

Is there a reason behind it?

Also, maybe we can use this post to discuss about the last ABC.

Full text and comments »

atcoder

aniervs
5 years ago
15

Regarding SWERC 2021-2022

By aniervs, history, 5 years ago, In English

Hello everybody. I wonder if there are some dates scheduled for the next SWERC. And there's no way of assuming a date, because the dates of past SWERC editions have changed in terms of months from one year to the other.

Does someone know?

Full text and comments »

icpc, swerc

aniervs
5 years ago
3

Competitive Companion & Rating Prediction in Safari.

By aniervs, history, 5 years ago, In English

Hello everyone. I saw recently a comment about an app for macOS that allows you to use CompetitiveCompanion in Safari (Big Sur); it's basically a translation of the original extension. It's called Codeforces Web Tool. It also has the Codeforces Rating Prediction feature. Since I didn't know about it until some months/weeks ago, this could have happened to other folks too.

This is the app.

Full text and comments »

tools, macos

aniervs
5 years ago
4

CIIC 2020's participants

By aniervs, history, 6 years ago, In English

Hello Codeforces. Tomorrow begins the CIIC 2020 (Ibero-American Contest in Informatics); it is the regional competition for Ibero-American Countries, similar to CEOI, but it is online. I just want to know which are the contestants from each participating country.

I think these are the ones going from Cuba (this is unofficially):

UPD: This is the ranking of countries (sorry for bad quality of image).

UPD: Here is the individual ranking,

UPD: Now you can do virtual participation here. Unfortunately it is only available in Spanish.

Full text and comments »

ciic, ioi

aniervs
6 years ago
33

Finding Extreme Points on Convex Polygons

By aniervs, history, 6 years ago, In English

Recently I've been studying some geometry, and now I want to know the different ways of computing the extremal Points on a set of points for a fixed direction, please leave it in the comments.

Thanks in advance.

Full text and comments »

computational geometry, convex hull, convex-polygon, geometry algorithm, geometry

aniervs
6 years ago
2

Kruskal's Algorithm for Dense Graph

By aniervs, history, 6 years ago, In English

Hello, how can Kruskal's algorithm be modified to run in O(n^2) time in a dense graph of n nodes??

Full text and comments »

mst, kruskal, #graph

aniervs
6 years ago
7

About IOI 2019's contest environment

By aniervs, history, 7 years ago, In English

Does anyone know when the equipment and software that will be used in IOI 2019 will be published, and why have not they done so yet?

Full text and comments »

aniervs
7 years ago
15

#	User	Rating
1	Benq	3792
2	VivaciousAubergine	3647
3	Kevin114514	3603
4	jiangly	3583
5	strapple	3515
6	tourist	3470
7	dXqwq	3436
8	Radewoosh	3415
9	Otomachi_Una	3413
10	Um_nik	3376

#	User	Contrib.
1	Qingyu	157
2	adamant	153
3	Um_nik	146
3	Proof_by_QED	146
5	Dominater069	145
6	errorgorn	141
7	cry	139
8	YuukiS	135
9	TheScrasse	134
10	chromate00	133