__AB__'s blog

By __AB__, history, 4 weeks ago, In English

High Expectations from Expectations

In this blog I aim to demonstrate how expectations can be used effectively to solve problems whose solutions otherwise look very clumsy.


Theory

Let’s start with the basics.

1. Linearity of Expectation

For any finite set of random variables $$$X_1, X_2, \dots, X_N$$$, which need not be independent, we have

$$$ \mathbb{E}\left[\sum_{i=1}^{N} X_i \right] = \sum_{i=1}^{N} \mathbb{E}[X_i] $$$

That’s it. Simple, cute and effective. This one line is usually enough to solve most of the probability problems.

Convinced it is easy, right?

Example 1: Game on Tree

Given a rooted tree with $$$n$$$ nodes, a game is played by Momiji where in each step, a node is chosen uniformly at random from the set of remaining nodes. The chosen node and its entire subtree are then removed from the tree. The process repeats until all nodes have been deleted. The goal is to calculate the expected number of operations required to finish the game.


In this problem, clearly the expected number of moves is equal to the number of nodes removed by Momiji.

Define $$$I_i$$$ as an indicator variable for whether the $$$i$$$-th node was removed directly by Momiji (and not as part of some ancestor's subtree).

We are interested in finding:

$$$ \mathbb{E}\left[\sum_{i=1}^{n} I_i\right] = \sum_{i=1}^{n} \mathbb{E}[I_i] = \sum_{i=1}^{n} P(I_i = 1) $$$

(The expectation of an indicator random variable equals the probability that it equals 1)

A node is picked by Momiji only if none of its ancestors have been picked earlier, otherwise it would already be deleted.
This means if you take the node and all its ancestors, put them in a bag, and draw one uniformly at random, the node must be drawn first for it to be picked by Momiji.
Hence the probability is:

$$$ P(I_i = 1) = \frac{1}{1 + \text{number of ancestors of node } i} = \frac{1}{\text{depth}_i} $$$

Therefore, the expected number of moves is:

$$$ \sum_{i=1}^{n} \frac{1}{\text{depth}_i} $$$

Example 2: Random Knockout Tournament

A tournament consists of $$$n$$$ equally skilled players. In each round, two players are randomly drawn to play a match; the loser is eliminated, and the winner proceeds. This continues until only one player remains. What is the probability that two specific players, $$$i$$$ and $$$j$$$, will face each other at some point during the tournament?


Due to the symmetry of the tournament (all players are equally skilled and pairings are random), the probability that any two players $$$i$$$ and $$$j$$$ face each other must be the same for every possible pair.

Let this constant probability be $$$P$$$.

Let $$$X_{i,j}$$$ be an indicator random variable for the pair $$${i, j}$$$:

$$$X_{i,j} = 1$$$ if player $$$i$$$ and player $$$j$$$ play a match.

$$$X_{i,j} = 0$$$ otherwise.

The expected value of this indicator is:

$$$E[X_{i,j}] = P(X_{i,j} = 1) = P$$$

In any knockout tournament with $$$n$$$ players, exactly $$$n-1$$$ players must be eliminated to produce one winner. Since each match eliminates exactly one player, there are exactly $$$n-1$$$ matches in total.

The total number of matches $$$M$$$ can also be expressed as the sum of all indicator variables for all possible pairs:

$$$M = \sum_{1 \le i \lt j \le n} X_{i,j}$$$

Taking the expectation of the total number of matches:

$$$E[M] = E\left[ \sum_{1 \le i \lt j \le n} X_{i,j} \right]$$$

Since $$$M$$$ is always $$$n-1$$$, $$$E[M] = n-1$$$. By linearity of expectation:

$$$n-1 = \sum_{1 \le i \lt j \le n} E[X_{i,j}]$$$

There are $$$\binom{n}{2}$$$ pairs in the summation. Substituting $$$E[X_{i,j}] = P$$$:

$$$n-1 = \binom{n}{2} \cdot P$$$
$$$P = \frac{2}{n}$$$

Linear Expectation in Combinatorics

In counting problems over a complete set, instead of trying to sum up every individual case, we can treat the set as a probability space. Calculate the expected value for one random configuration, then multiply by the total number of configurations to get the final sum. This transformation allows us to bypass messy combinatorics entirely

Problem 1: Count the Blocks

Consider all integers from $$$0$$$ to $$$10^n - 1$$$ written as exactly $$$n$$$ digits with leading zeros ($$$1 \le n \le 2 \cdot 10^5$$$).

A block is a maximal consecutive sequence of the same digit.
For each length $$$i$$$ from 1 to $$$n$$$, count the total number of blocks of exactly length $$$i$$$ that appear across all these $$$n$$$-digit strings.


While this problem is often approached with direct counting, it becomes much cleaner using Linearity of Expectation.

Instead of counting blocks across all $$$10^n$$$ strings, we find the expected number of blocks of length $$$k$$$ in a single, uniformly random string of length $$$n$$$, then multiply by $$$10^n$$$.

Let $$$X$$$ be the number of blocks of length exactly $$$k$$$. We want to find:

$$$10^n \times \mathbb{E}[X]$$$

Define $$$I_i$$$ as an indicator random variable where $$$I_i = 1$$$ if a block of length $$$k$$$ starts at position $$$i$$$, and $$$I_i = 0$$$ otherwise.

For a block of length $$$k$$$ to start at position $$$i$$$, two conditions must be met:

  1. Internal Consistency: All $$$k$$$ digits from $$$i$$$ to $$$i+k-1$$$ must be the same. (Probability: $$$10 \times (1/10)^k = 1/10^{k-1}$$$)

  2. Maximality: Any digit immediately adjacent to this sequence must be different to ensure the block doesn't "stretch" longer than $$$k$$$.

Case 1: The block is at an end ($$$i = 1$$$ or $$$i = n-k+1$$$)

There is exactly one neighbor. To ensure the block is exactly length $$$k$$$, that neighbor must be one of the 9 other digits.

$$$P(I_{\text{end}} = 1) = \left( \frac{1}{10^{k-1}} \right) \times \frac{9}{10} = \frac{9}{10^k}$$$

(Note: If $$$k=n$$$, there are no neighbors, so the probability is simply $$$1/10^{n-1}$$$.)

Case 2: The block is in the middle ($$$1 \lt i \lt n-k+1$$$)

There are two neighbors (left and right). Both must be different from the block's digits.

$$$P(I_{\text{mid}} = 1) = \left( \frac{1}{10^{k-1}} \right) \times \left( \frac{9}{10} \right)^2 = \frac{81}{10^{k+1}}$$$

Now.

to get the total count, we multiply the expected value by $$$10^n$$$:

  1. If $$$k = n$$$: Only 1 possible position.
$$$\text{Count} = 10^n \times \frac{1}{10^{n-1}} = 10$$$
  1. If $$$k \lt n$$$:
  • Ends: 2 positions ($$$i=1$$$ and $$$i=n-k+1$$$).
  • Middle: $$$(n-k+1) - 2 = (n-k-1)$$$ positions.
$$$\text{Total Count} = 10^n \times \left( 2 \cdot \frac{9}{10^k} + \max(0, n-k-1) \cdot \frac{81}{10^{k+1}} \right)$$$
$$$ \implies \text{Total Count} = 10^{n-k-1} \times \left( 180 + \max(0, n-k-1) \cdot {81} \right)$$$

Check my submission here.


Problem 2: Max History

For a permutation $$$a$$$ of $$$n$$$ integers, let $$$f(a)$$$ be the sum of elements that are greater than or equal to all preceding elements. Calculate the total sum of $$$f(a)$$$ over all $$$n!$$$ permutations, modulo $$$10^9 + 7$$$


This is a beautiful problem but the solution looks terrifying in the editorial. Let us try to solve this problem using probability theory.

We are asked to find the sum of $$$f(a)$$$ over all permutations of the array. This is equivalent to finding

$$$ n! \times \mathbb{E}[f(a)] $$$

Because the expected value of $$$f(a)$$$ over a uniformly random permutation is exactly the average of all those values.


Now, let’s analyze $$$f(a)$$$.
Define $$$I_i$$$ as an indicator random variable for element $$$a_i$$$:

  • $$$I_i = 1$$$ if $$$a_i$$$ contributes to $$$f(a)$$$ (i.e., it is greater than all elements before it),
  • $$$I_i = 0$$$ otherwise.

Then

$$$ f(a) = \sum_{i=1}^{n} I_i \cdot a_i $$$

Taking expectation on both sides and using linearity:

$$$ \mathbb{E}[f(a)] = \sum_{i=1}^{n} a_i \cdot \mathbb{E}[I_i] = \sum_{i=1}^{n} a_i \cdot P(I_i = 1) $$$

Now, what is $$$P(I_i = 1)$$$?
It is simply the probability that $$$a_i$$$ is greater than all the elements before it in the permutation. In other words, it is the probability that $$$a_i$$$ occurs before all the elements greater than or equal to it.

Let $$$g_i$$$ = number of elements greater than or equal to $$$a_i$$$, including $$$a_i$$$

Then $$$a_i$$$ should occur first in these $$$g_i$$$ elements. This happens with probability $$$\frac{1}{g_i}$$$.

So

$$$ P(I_i = 1) = \frac{1}{g_i} $$$

Therefore

$$$ \mathbb{E}[f(a)] = \sum_{i=1}^{n} a_i \cdot \frac{1}{g_i} $$$

and the final answer (sum over all permutations) is

$$$ n! \times \sum_{i=1}^{n} \frac{a_i}{g_i} $$$

Ignoring the edge case as it is irrelevant to the blog. Check my submission here.


Full text and comments »

  • Vote: I like it
  • +4
  • Vote: I do not like it

By __AB__, 6 months ago, In English

Wald's Equation

The aim of this blog is to demonstrate the use of Wald's Equation in competitive programming. I am aware that questions on Wald's Equation rarely appear, but still it is a powerful technique to learn for mathematics enthusiasts.


Theory

If $$$X_1, X_2, \dots$$$ are IID random variables, and $$$N$$$ is the stopping time (that is, it depends only on past and present $$$X_i$$$'s and not the future), then

$$$ \mathbb{E}\left[\sum_{i=1}^{N} X_i\right] = \mathbb{E}[N] \cdot \mathbb{E}[X_1] $$$

Not clear yet?

Example: You keep rolling a fair die until you get a 6. What’s the expected sum of all numbers rolled?

Let $$$X_i$$$ = number on the $$$i^{th}$$$ roll,
and $$$N$$$ = number of rolls until the first 6 appears. $$$N$$$ is clearly the stopping time

We know
- $$$X_i$$$ are IID with $$$\mathbb{E}[X_i] = \frac{7}{2}$$$
- $$$N$$$ follows a geometric distribution with $$$p = \frac{1}{6}$$$, hence $$$\mathbb{E}[N] = \frac{1}{p} = 6$$$

By Wald’s Equation:

$$$ \mathbb{E}\left[\sum_{i=1}^{N} X_i\right] = \mathbb{E}[N] \cdot \mathbb{E}[X_i] = 6 \times \frac{7}{2} = 21 $$$

Example: Expected Cost of Coupon Collecting

The standard Coupon Collector's Problem goes like this : given $$$n$$$ coupons, how many coupons do you expect you need to draw with replacement before having drawn each coupon at least once?

The answer to that is $$$nH_n$$$ where $$$H_n$$$ is the $$$n$$$-th Harmonic number. This can be proven easily using Linearity of Expectations. You may find the proof on the attached Wikipedia article as well.

Now, let's add a twist. Let the cost of $$$i$$$ th coupon be $$$i$$$ dollars. What will be the expected cost till you have collected all the $$$n$$$ coupons?

Let $$$X_i$$$ be the the cost of the coupon drawn on the $$$i$$$ th draw. Clearly $$$N$$$ is the stopping time as the set of $$$X_i$$$ can determine whether we need to stop or not. We need to find $$$\mathbb{E}[X_1 + ... X_N]$$$

The $$$X_i$$$ are IID random variables. In any single draw, we get one of the $$$n$$$ coupon types with equal probability. Thus, the expected cost of a single draw is the average of all possible costs

$$$ \mathbb{E}[X_i] = \frac{1+2+\dots+n}{n} = \frac{n(n+1)/2}{n} = \frac{n+1}{2} $$$

$$$N$$$ is the standard stopping time for the Coupon Collector's problem, so $$$\mathbb{E}[N] = nH_n$$$

By using Wald's Equation

$$$ \mathbb{E}\left[\sum_{i=1}^{N} X_i\right] = \mathbb{E}[N] \cdot \mathbb{E}[X_i] = n H_n \cdot \frac{n+1}{2} $$$

A General Form

Sometimes, the random variables we are summing, $$$X_i$$$, don't contain enough information on their own to define the stopping time $$$N$$$. For example, $$$N$$$ might depend on the entire outcome of an experiment (like a full permutation), while $$$X_i$$$ is just one part of that outcome (like the sum of elements). This more general form handles that situation perfectly

Let $$$(Y_i)_{i \ge 1}$$$ be IID real random variables, and let $$$f$$$ be a measurable function. Define $$$X_i = f(Y_i)$$$.

Let $$$N$$$ be a stopping time with respect to the filtration $$$\mathbb{F}_n = \sigma(Y_1, \dots, Y_n)$$$. Assume that

$$$ \mathbb{E}[|f(Y_1)|] \lt \infty, \qquad \mathbb{E}[N] \lt \infty $$$

Then,

$$$ \mathbb{E}\left[\sum_{i=1}^{N} X_i\right] = \mathbb{E}[N] \cdot \mathbb{E}[X_1] $$$

Note : $$$N$$$ need not be a stopping time for $$$(X_i)$$$.

Proof: Set $$$\mu = \mathbb{E}[X_1] = \mathbb{E}[f(Y_1)]$$$, and define

$$$ S_n = \sum_{i=1}^n X_i = \sum_{i=1}^n f(Y_i), \quad S_0 = 0 $$$

Let $$$\mathbb{F}_n = \sigma(Y_1, \dots, Y_n)$$$, and define

$$$ M_n = S_n - n\mu, \quad n \ge 0 $$$

We claim that $$$(M_n, \mathbb{F}_n)$$$ is a martingale. Indeed, since the $$$Y_i$$$ (hence $$$X_i$$$) are IID,

$$$ \begin{array}{rcl} \mathbb{E}[M_{n+1} \mid \mathbb{F}_n] & = & \mathbb{E}[S_n + X_{n+1} - (n+1)\mu \mid \mathbb{F}_n] \\ & = & S_n - n\mu + \mathbb{E}[X_{n+1} \mid \mathbb{F}_n] - \mu \end{array} $$$

But $$$X_{n+1}$$$ is independent of $$$\mathbb{F}_n$$$, and $$$\mathbb{E}[X_{n+1}] = \mu$$$, so

$$$ \mathbb{E}[M_{n+1} \mid \mathbb{F}_n] = M_n $$$

Now, $$$N$$$ is a stopping time with respect to $$$(\mathbb{F}_n)$$$. By the optional stopping theorem (using $$$\mathbb{E}[|X_1|] \lt \infty$$$ and $$$\mathbb{E}[N] \lt \infty$$$), we have

$$$ \mathbb{E}[M_N] = \mathbb{E}[M_0] = 0 $$$

Expanding $$$M_N$$$,

$$$ \begin{array}{rcl} 0 & = & \mathbb{E}[S_N - N\mu] \\ & = & \mathbb{E}\left[\sum_{i=1}^{N} X_i\right] - \mu \, \mathbb{E}[N] \end{array} $$$

Hence,

$$$ \mathbb{E}\left[\sum_{i=1}^{N} X_i\right] = \mathbb{E}[N] \cdot \mathbb{E}[X_1] $$$

Remarks: - The trick is to build the martingale $$$M_n = S_n - n\mu$$$ using the $$$\sigma$$$-algebra generated by $$$Y_i$$$. Since $$$N$$$ is a stopping time for this filtration, the optional stopping theorem applies even though $$$N$$$ may not be a stopping time for the coarser filtration generated by $$$X_i = f(Y_i)$$$. - The assumptions $$$\mathbb{E}[|f(Y_1)|] \lt \infty$$$ and $$$\mathbb{E}[N] \lt \infty$$$ are standard

Not clear yet?

Example: Expected Sum of First Element Until Derangement

Suppose you have an array of $$$a$$$ length $$$n$$$ = $$${1, 2, \dots, n}$$$. In each iteration, the array $$$a$$$ is randomly shuffled. Let $$$X_i$$$ be the first element of the array after the $$$i$$$-th shuffle. You maintain a sum $$$S = \sum X_i$$$, starting at $$$S = 0$$$. The process stops at time $$$N$$$, which is the first iteration where the resulting permutation is a derangement of the initial array $$$a$$$. What is the expected value of the final sum $$$\mathbb{E}[S]$$$?

Let $$$Y_i$$$ be the $$$i$$$th permutation. It is easy to verify that $$$N$$$ is a stopping time for the sequence of permutations $$$Y_i$$$
The terms of the sum are $$$X_i = f(Y_i)$$$, where $$$f$$$ extracts the first element of the permutation, i.e., $$$X_i = Y_i[0]$$$

We can thus apply Wald's Equation:

$$$ \mathbb{E}\left[S\right] = \mathbb{E}\left[\sum_{i=1}^{N} X_i\right] = \mathbb{E}[N] \cdot \mathbb{E}[X_1] $$$

Expected Value of a Single Term $$$\mathbb{E}[X_1]$$$

In any iteration, the array is randomly shuffled.
Thus, the first element $$$X_i$$$ is equally likely to be any of the $$$n$$$ elements in the set $$$a$$$.

The expected value of the first element is:

$$$ \mathbb{E}[X_1] = \frac{1}{n} \sum_{i=1}^{n} a_i = \frac{(n \cdot (n+1))/2}{n} = (n+1)/2 $$$

Expected Stopping Time $$$\mathbb{E}[N]$$$

The stopping time $$$N$$$ is the first iteration at which the permutation is a derangement.
The trials are independent, and the probability of success (a derangement) is constant.
Hence, $$$N$$$ follows a Geometric Distribution.

The total number of permutations of $$$n$$$ elements is $$$n!$$$.
The number of derangements of $$$n$$$ elements is $$$D_n$$$.
The probability of a single shuffle resulting in a derangement is:

$$$ p = P(\text{derangement}) = \frac{D_n}{n!} $$$

Since $$$N \sim \text{Geometric}(p)$$$, its expected value is:

$$$ \mathbb{E}[N] = \frac{1}{p} = \frac{n!}{D_n} $$$

Expected Sum $$$\mathbb{E}[S]$$$

Substituting the expected values back into Wald's equation:

$$$ \mathbb{E}[S] = \mathbb{E}[N] \cdot \mathbb{E}[X_1] = \left(\frac{n!}{D_n}\right) \cdot \left(\frac{n+1}{2}\right) $$$

Problem: ZS Shuffles Cards

This problem has a complex solution in the official editorial, full of combinatorics and summations. And while that is perfectly valid, we can absolutely arrive at an elegant solution using Wald's Equation.

Some parts of this approach appear in the comments, but the following is my independently derived version:


Setup

Let each numbered card be denoted by C, and each joker by J.
A possible sequence of draws might look like this:

C C C C J
C C J
J
C C C C J

Each line (ending with a joker) represents one iteration.
Let the length of the $$$i$$$-th iteration be denoted by $$$X_i$$$, and let $$$N$$$ be the total number of iterations.

So in this example, $$$X_1 = 5$$$, $$$X_2 = 3$$$, $$$X_3 = 1$$$, $$$X_4 = 5$$$ and $$$N = 4$$$.

Our goal is to find $$$\mathbb{E}[X_1 + X_2 + \dots + X_N]$$$.

Note that if we define $$$Y_i$$$ as the set of cards in the $$$i$$$-th iteration, then clearly $$$N$$$ becomes the stopping time adapted to the filtration $$$\mathbb{F}_n = \sigma(Y_1, \dots, Y_n)$$$. Now let $$$X_i$$$ be the cardinality of the set in the $$$i$$$th iteration $$$X_i = | Y_i |$$$. As discussed above, we can apply Wald's equation here.

$$$ \mathbb{E}\left[\sum_{i=1}^{N} X_i\right] = \mathbb{E}[X_i] \, \mathbb{E}[N]. $$$

Let us calculate both of them one by one


Step 1: Calculating $$$\mathbb{E}[X_i]$$$

We need the expected length of an iteration, i.e., the number of numbered cards drawn when a joker appears (the joker card also counts).

Define indicator random variables for each numbered card $$$I_j$$$, where
$$$I_j = 1$$$ if the $$$j$$$-th numbered card is drawn in the iteration, otherwise $$$0$$$.

Then
$$$X_i = I_1 + I_2 + \dots + I_n + 1$$$ (The extra 1 is because of the joker card)

By linearity of expectation:

$$$ \mathbb{E}[X_i] = \mathbb{E}\left[\sum_{j=1}^{n} I_j + 1\right] = \sum_{j=1}^{n} \mathbb{E}[I_j] + 1 = \sum_{j=1}^{n} P(I_j = 1) + 1 $$$

Now, what is $$$P(I_j = 1)$$$?

For the $$$j$$$-th numbered card to appear in the iteration, it must appear before all the jokers in the shuffled deck.
This means that if you take the $$$j$$$-th numbered card and all the jokers, put them in a bag, and draw one uniformly at random, then the numbered card should be drawn first. Hence the probability becomes $$$\frac{1}{\text{number of cards in the bag}}$$$

$$$\implies$$$ The probability that $$$I_j = 1$$$ is $$$\frac{1}{m + 1}$$$.

Therefore:

$$$ \mathbb{E}[X_i] = \frac{n}{m + 1} + 1 $$$

Step 2: Calculating $$$\mathbb{E}[N]$$$

Now for the expected number of iterations.

Let’s define $$$a_k$$$ as the expected number of iterations remaining when $$$k$$$ distinct numbered cards have already been drawn and are in the set $$$S$$$.
Let $$$C$$$ be the total number of cards remaining in the bag at this point.

There are three possible cases when drawing the next card:

  1. We draw a joker : probability $$$\frac{m}{C}$$$. We remain in the same state but 1 more iteration gets added
  2. We draw a new numbered card (not in $$$S$$$), probability $$$\frac{n - k}{C}$$$. We go to the $$$k+1$$$ state
  3. We draw a numbered card already in $$$S$$$, probability $$$1 - \frac{m}{C} - \frac{n - k}{C}$$$ . We remain in the same state and in the same iteration

Observe that Case 3 doesn't affect the expected number of iterations, it only delays when we reach Case 1 or Case 2. Since we must eventually draw either a joker or a new numbered card (these events occur with probability 1), we can condition on the next meaningful draw being either Case 1 or Case 2. To take into account the reduced sample set, the conditional probabilities become:

  • Joker: $$$\frac{m}{m + n - k}$$$
  • New numbered card: $$$\frac{n - k}{m + n - k}$$$

Now we can write a recurrence for $$$a_k$$$:

$$$ a_k = \frac{m}{m + n - k}(1 + a_k) + \frac{n - k}{m + n - k}a_{k + 1} $$$

Simplifying:

$$$ a_k = a_{k + 1} + \frac{m}{n - k} $$$

Base cases:

  • When no numbered card has been drawn yet: we want $$$a_0$$$
  • When all numbered cards are drawn: $$$a_n = 1$$$ (since we just need one final joker)

Hence:

$$$ a_0 = m\left(1 + \frac{1}{2} + \frac{1}{3} + \dots + \frac{1}{n}\right) + 1 = mH_n + 1 $$$

where $$$H_n$$$ is the $$$n$$$-th Harmonic number.

Therefore:

$$$ \mathbb{E}[N] = mH_n + 1 $$$

Step 3: Final Answer

Combining the two:

$$$ \mathbb{E}\left[\sum_{i=1}^{N} X_i\right] = \mathbb{E}[N] \cdot \mathbb{E}[X_i] = (mH_n + 1) \cdot (\frac{n}{m + 1} +1) $$$

That’s it, a problem that required several combinatorics equations dissolved neatly into an elegant solution using probability theory. Check my submission here.


PS: I have tried to keep the math jargon to a minimum so that it is approachable for the usual CP crowd, although in a few places, correctness of the blog demanded use of mathematical terms.

Having majored in mathematics, I find the role of theory in competitive programming fascinating. We may not need formal proofs mid-contest, but knowing powerful general results (like this one) can be very handy. My goal with this post was to share one of those "general results" that I find elegant and useful. I hope it helps, and I would be interested to hear what others think!

Full text and comments »

  • Vote: I like it
  • +19
  • Vote: I do not like it