#	User	Rating
1	Benq	3792
2	VivaciousAubergine	3647
3	jiangly	3631
4	Kevin114514	3574
5	maroonrk	3521
6	strapple	3515
7	Radewoosh	3461
8	tourist	3428
9	turmax	3378
10	Um_nik	3376

#	User	Contrib.
1	Qingyu	161
2	adamant	146
3	Um_nik	145
4	Dominater069	142
5	errorgorn	140
6	cry	138
7	Proof_by_QED	136
8	YuukiS	135
9	chromate00	134
10	soullless	133

Wald's Equation

The aim of this blog is to demonstrate the use of Wald's Equation in competitive programming. I am aware that questions on Wald's Equation rarely appear, but still it is a powerful technique to learn for mathematics enthusiasts.

Theory

If $$$X_1, X_2, \dots$$$ are IID random variables, and $$$N$$$ is the stopping time (that is, it depends only on past and present $$$X_i$$$'s and not the future), then

$$$ \mathbb{E}\left[\sum_{i=1}^{N} X_i\right] = \mathbb{E}[N] \cdot \mathbb{E}[X_1] $$$

Not clear yet?

Example: You keep rolling a fair die until you get a 6. What’s the expected sum of all numbers rolled?

Let $$$X_i$$$ = number on the $$$i^{th}$$$ roll,
and $$$N$$$ = number of rolls until the first 6 appears. $$$N$$$ is clearly the stopping time

We know
- $$$X_i$$$ are IID with $$$\mathbb{E}[X_i] = \frac{7}{2}$$$
- $$$N$$$ follows a geometric distribution with $$$p = \frac{1}{6}$$$, hence $$$\mathbb{E}[N] = \frac{1}{p} = 6$$$

By Wald’s Equation:

$$$ \mathbb{E}\left[\sum_{i=1}^{N} X_i\right] = \mathbb{E}[N] \cdot \mathbb{E}[X_i] = 6 \times \frac{7}{2} = 21 $$$

Example: Expected Cost of Coupon Collecting

The standard Coupon Collector's Problem goes like this : given $$$n$$$ coupons, how many coupons do you expect you need to draw with replacement before having drawn each coupon at least once?

The answer to that is $$$nH_n$$$ where $$$H_n$$$ is the $$$n$$$-th Harmonic number. This can be proven easily using Linearity of Expectations. You may find the proof on the attached Wikipedia article as well.

Now, let's add a twist. Let the cost of $$$i$$$ th coupon be $$$i$$$ dollars. What will be the expected cost till you have collected all the $$$n$$$ coupons?

Let $$$X_i$$$ be the the cost of the coupon drawn on the $$$i$$$ th draw. Clearly $$$N$$$ is the stopping time as the set of $$$X_i$$$ can determine whether we need to stop or not. We need to find $$$\mathbb{E}[X_1 + ... X_N]$$$

The $$$X_i$$$ are IID random variables. In any single draw, we get one of the $$$n$$$ coupon types with equal probability. Thus, the expected cost of a single draw is the average of all possible costs

$$$ \mathbb{E}[X_i] = \frac{1+2+\dots+n}{n} = \frac{n(n+1)/2}{n} = \frac{n+1}{2} $$$

$$$N$$$ is the standard stopping time for the Coupon Collector's problem, so $$$\mathbb{E}[N] = nH_n$$$

By using Wald's Equation

$$$ \mathbb{E}\left[\sum_{i=1}^{N} X_i\right] = \mathbb{E}[N] \cdot \mathbb{E}[X_i] = n H_n \cdot \frac{n+1}{2} $$$

A General Form

Sometimes, the random variables we are summing, $$$X_i$$$, don't contain enough information on their own to define the stopping time $$$N$$$. For example, $$$N$$$ might depend on the entire outcome of an experiment (like a full permutation), while $$$X_i$$$ is just one part of that outcome (like the sum of elements). This more general form handles that situation perfectly

Let $$$(Y_i)_{i \ge 1}$$$ be IID real random variables, and let $$$f$$$ be a measurable function. Define $$$X_i = f(Y_i)$$$.

Let $$$N$$$ be a stopping time with respect to the filtration $$$\mathbb{F}_n = \sigma(Y_1, \dots, Y_n)$$$. Assume that

$$$ \mathbb{E}[|f(Y_1)|] \lt \infty, \qquad \mathbb{E}[N] \lt \infty $$$

Then,

$$$ \mathbb{E}\left[\sum_{i=1}^{N} X_i\right] = \mathbb{E}[N] \cdot \mathbb{E}[X_1] $$$

Note : $$$N$$$ need not be a stopping time for $$$(X_i)$$$.

Proof: Set $$$\mu = \mathbb{E}[X_1] = \mathbb{E}[f(Y_1)]$$$, and define

$$$ S_n = \sum_{i=1}^n X_i = \sum_{i=1}^n f(Y_i), \quad S_0 = 0 $$$

Let $$$\mathbb{F}_n = \sigma(Y_1, \dots, Y_n)$$$, and define

$$$ M_n = S_n - n\mu, \quad n \ge 0 $$$

We claim that $$$(M_n, \mathbb{F}_n)$$$ is a martingale. Indeed, since the $$$Y_i$$$ (hence $$$X_i$$$) are IID,

$$$ \begin{array}{rcl} \mathbb{E}[M_{n+1} \mid \mathbb{F}_n] & = & \mathbb{E}[S_n + X_{n+1} - (n+1)\mu \mid \mathbb{F}_n] \\ & = & S_n - n\mu + \mathbb{E}[X_{n+1} \mid \mathbb{F}_n] - \mu \end{array} $$$

But $$$X_{n+1}$$$ is independent of $$$\mathbb{F}_n$$$, and $$$\mathbb{E}[X_{n+1}] = \mu$$$, so

$$$ \mathbb{E}[M_{n+1} \mid \mathbb{F}_n] = M_n $$$

Now, $$$N$$$ is a stopping time with respect to $$$(\mathbb{F}_n)$$$. By the optional stopping theorem (using $$$\mathbb{E}[|X_1|] \lt \infty$$$ and $$$\mathbb{E}[N] \lt \infty$$$), we have

$$$ \mathbb{E}[M_N] = \mathbb{E}[M_0] = 0 $$$

Expanding $$$M_N$$$,

$$$ \begin{array}{rcl} 0 & = & \mathbb{E}[S_N - N\mu] \\ & = & \mathbb{E}\left[\sum_{i=1}^{N} X_i\right] - \mu \, \mathbb{E}[N] \end{array} $$$

Hence,

$$$ \mathbb{E}\left[\sum_{i=1}^{N} X_i\right] = \mathbb{E}[N] \cdot \mathbb{E}[X_1] $$$

Remarks: - The trick is to build the martingale $$$M_n = S_n - n\mu$$$ using the $$$\sigma$$$-algebra generated by $$$Y_i$$$. Since $$$N$$$ is a stopping time for this filtration, the optional stopping theorem applies even though $$$N$$$ may not be a stopping time for the coarser filtration generated by $$$X_i = f(Y_i)$$$. - The assumptions $$$\mathbb{E}[|f(Y_1)|] \lt \infty$$$ and $$$\mathbb{E}[N] \lt \infty$$$ are standard

Not clear yet?

Example: Expected Sum of First Element Until Derangement

Suppose you have an array of $$$a$$$ length $$$n$$$ = $$${1, 2, \dots, n}$$$. In each iteration, the array $$$a$$$ is randomly shuffled. Let $$$X_i$$$ be the first element of the array after the $$$i$$$-th shuffle. You maintain a sum $$$S = \sum X_i$$$, starting at $$$S = 0$$$. The process stops at time $$$N$$$, which is the first iteration where the resulting permutation is a derangement of the initial array $$$a$$$. What is the expected value of the final sum $$$\mathbb{E}[S]$$$?

Let $$$Y_i$$$ be the $$$i$$$th permutation. It is easy to verify that $$$N$$$ is a stopping time for the sequence of permutations $$$Y_i$$$
The terms of the sum are $$$X_i = f(Y_i)$$$, where $$$f$$$ extracts the first element of the permutation, i.e., $$$X_i = Y_i[0]$$$

We can thus apply Wald's Equation:

$$$ \mathbb{E}\left[S\right] = \mathbb{E}\left[\sum_{i=1}^{N} X_i\right] = \mathbb{E}[N] \cdot \mathbb{E}[X_1] $$$

Expected Value of a Single Term $$$\mathbb{E}[X_1]$$$

In any iteration, the array is randomly shuffled.
Thus, the first element $$$X_i$$$ is equally likely to be any of the $$$n$$$ elements in the set $$$a$$$.

The expected value of the first element is:

$$$ \mathbb{E}[X_1] = \frac{1}{n} \sum_{i=1}^{n} a_i = \frac{(n \cdot (n+1))/2}{n} = (n+1)/2 $$$

Expected Stopping Time $$$\mathbb{E}[N]$$$

The stopping time $$$N$$$ is the first iteration at which the permutation is a derangement.
The trials are independent, and the probability of success (a derangement) is constant.
Hence, $$$N$$$ follows a Geometric Distribution.

The total number of permutations of $$$n$$$ elements is $$$n!$$$.
The number of derangements of $$$n$$$ elements is $$$D_n$$$.
The probability of a single shuffle resulting in a derangement is:

$$$ p = P(\text{derangement}) = \frac{D_n}{n!} $$$

Since $$$N \sim \text{Geometric}(p)$$$, its expected value is:

$$$ \mathbb{E}[N] = \frac{1}{p} = \frac{n!}{D_n} $$$

Expected Sum $$$\mathbb{E}[S]$$$

Substituting the expected values back into Wald's equation:

$$$ \mathbb{E}[S] = \mathbb{E}[N] \cdot \mathbb{E}[X_1] = \left(\frac{n!}{D_n}\right) \cdot \left(\frac{n+1}{2}\right) $$$

Problem: ZS Shuffles Cards

This problem has a complex solution in the official editorial, full of combinatorics and summations. And while that is perfectly valid, we can absolutely arrive at an elegant solution using Wald's Equation.

Some parts of this approach appear in the comments, but the following is my independently derived version:

Setup

Let each numbered card be denoted by C, and each joker by J.
A possible sequence of draws might look like this:

C C C C J
C C J
J
C C C C J

Each line (ending with a joker) represents one iteration.
Let the length of the $$$i$$$-th iteration be denoted by $$$X_i$$$, and let $$$N$$$ be the total number of iterations.

So in this example, $$$X_1 = 5$$$, $$$X_2 = 3$$$, $$$X_3 = 1$$$, $$$X_4 = 5$$$ and $$$N = 4$$$.

Our goal is to find $$$\mathbb{E}[X_1 + X_2 + \dots + X_N]$$$.

Note that if we define $$$Y_i$$$ as the set of cards in the $$$i$$$-th iteration, then clearly $$$N$$$ becomes the stopping time adapted to the filtration $$$\mathbb{F}_n = \sigma(Y_1, \dots, Y_n)$$$. Now let $$$X_i$$$ be the cardinality of the set in the $$$i$$$th iteration $$$X_i = | Y_i |$$$. As discussed above, we can apply Wald's equation here.

$$$ \mathbb{E}\left[\sum_{i=1}^{N} X_i\right] = \mathbb{E}[X_i] \, \mathbb{E}[N]. $$$

Let us calculate both of them one by one

Step 1: Calculating $$$\mathbb{E}[X_i]$$$

We need the expected length of an iteration, i.e., the number of numbered cards drawn when a joker appears (the joker card also counts).

Define indicator random variables for each numbered card $$$I_j$$$, where
$$$I_j = 1$$$ if the $$$j$$$-th numbered card is drawn in the iteration, otherwise $$$0$$$.

Then
$$$X_i = I_1 + I_2 + \dots + I_n + 1$$$ (The extra 1 is because of the joker card)

By linearity of expectation:

$$$ \mathbb{E}[X_i] = \mathbb{E}\left[\sum_{j=1}^{n} I_j + 1\right] = \sum_{j=1}^{n} \mathbb{E}[I_j] + 1 = \sum_{j=1}^{n} P(I_j = 1) + 1 $$$

Now, what is $$$P(I_j = 1)$$$?

For the $$$j$$$-th numbered card to appear in the iteration, it must appear before all the jokers in the shuffled deck.
This means that if you take the $$$j$$$-th numbered card and all the jokers, put them in a bag, and draw one uniformly at random, then the numbered card should be drawn first. Hence the probability becomes $$$\frac{1}{\text{number of cards in the bag}}$$$

$$$\implies$$$ The probability that $$$I_j = 1$$$ is $$$\frac{1}{m + 1}$$$.

Therefore:

$$$ \mathbb{E}[X_i] = \frac{n}{m + 1} + 1 $$$

Step 2: Calculating $$$\mathbb{E}[N]$$$

Now for the expected number of iterations.

Let’s define $$$a_k$$$ as the expected number of iterations remaining when $$$k$$$ distinct numbered cards have already been drawn and are in the set $$$S$$$.
Let $$$C$$$ be the total number of cards remaining in the bag at this point.

There are three possible cases when drawing the next card:

We draw a joker : probability $$$\frac{m}{C}$$$. We remain in the same state but 1 more iteration gets added
We draw a new numbered card (not in $$$S$$$), probability $$$\frac{n - k}{C}$$$. We go to the $$$k+1$$$ state
We draw a numbered card already in $$$S$$$, probability $$$1 - \frac{m}{C} - \frac{n - k}{C}$$$ . We remain in the same state and in the same iteration

Observe that Case 3 doesn't affect the expected number of iterations, it only delays when we reach Case 1 or Case 2. Since we must eventually draw either a joker or a new numbered card (these events occur with probability 1), we can condition on the next meaningful draw being either Case 1 or Case 2. To take into account the reduced sample set, the conditional probabilities become:

Joker: $$$\frac{m}{m + n - k}$$$
New numbered card: $$$\frac{n - k}{m + n - k}$$$

Now we can write a recurrence for $$$a_k$$$:

$$$ a_k = \frac{m}{m + n - k}(1 + a_k) + \frac{n - k}{m + n - k}a_{k + 1} $$$

Simplifying:

$$$ a_k = a_{k + 1} + \frac{m}{n - k} $$$

Base cases:

When no numbered card has been drawn yet: we want $$$a_0$$$
When all numbered cards are drawn: $$$a_n = 1$$$ (since we just need one final joker)

Hence:

$$$ a_0 = m\left(1 + \frac{1}{2} + \frac{1}{3} + \dots + \frac{1}{n}\right) + 1 = mH_n + 1 $$$

where $$$H_n$$$ is the $$$n$$$-th Harmonic number.

Therefore:

$$$ \mathbb{E}[N] = mH_n + 1 $$$

Step 3: Final Answer

Combining the two:

$$$ \mathbb{E}\left[\sum_{i=1}^{N} X_i\right] = \mathbb{E}[N] \cdot \mathbb{E}[X_i] = (mH_n + 1) \cdot (\frac{n}{m + 1} +1) $$$

That’s it, a problem that required several combinatorics equations dissolved neatly into an elegant solution using probability theory. Check my submission here.

PS: I have tried to keep the math jargon to a minimum so that it is approachable for the usual CP crowd, although in a few places, correctness of the blog demanded use of mathematical terms.

Having majored in mathematics, I find the role of theory in competitive programming fascinating. We may not need formal proofs mid-contest, but knowing powerful general results (like this one) can be very handy. My goal with this post was to share one of those "general results" that I find elegant and useful. I hope it helps, and I would be interested to hear what others think!