Algorithmically guessing a coin's bias

spike1236's blog

By spike1236, 3 months ago, In English

Hi, Codeforces! Recently I thought of an interesting problem and would like to hear your thoughts.

You are given some $$$0 \lt \varepsilon \lt \frac{1}{2}$$$ as input and have an unknown biased coin with

$$$ p = \Pr(\text{Heads}),\qquad 1-p = \Pr(\text{Tails}),\qquad p\in[0,1]. $$$

You toss it sequentially. After each toss $$$t$$$, you must output an estimate $$$p_t\in[0,1]$$$. An oracle who knows the true $$$p$$$ stops the process at the first time your estimate is $$$\varepsilon$$$-close:

$$$ N := \min\{t\ge 1:\ |p_t-p|\le \varepsilon\}. $$$

The algorithm does not choose when to stop; it only outputs estimates after each toss. The task is to design an algorithm (possibly randomized) with optimal worst-case expected number of tosses:

$$$\inf_{\text{algorithms}} \sup_{p\in[0,1]} \mathbb{E}_p[N].$$$

Also, what strategy achieves it (and how would you prove the bound)?

probability, game theory

spike1236
3 months ago
11

Comments (11)

Write comment?

AksLolCoding

3 months ago, hide # |

+14

Complete guess

→ Reply

VLamarca

3 months ago, hide # ^ |

← Rev. 2 →

Nevermind :) But I dont think splitting in intervals and guessing on the medians is necessarily optimal.

→ Reply

spike1236

3 months ago, hide # ^ |

← Rev. 2 →

Nice, it seems for this strat,

the worst-case expected value is

Also, we can notice the oracle is independent of the game (it just checks the printed $$$p_t$$$), so I think trying to learn the distribution might not be the best approach

→ Reply

Wielomian

3 months ago, hide # ^ |

+11

If you never repeat your guesses it's nessesarily at most $$$\Theta(1/\epsilon)$$$, as this is the segment count. Heuristically, trying to "learn" the distribution should only make this faster, as law of large numbers will show you that #heads / #trials approaches $$$p$$$ with variance inversely proportional to the #trials.

It should be relatively easy to write some monte carlo simulation (fix $$$\epsilon$$$, randomize $$$p$$$, sample until $$$N$$$ using this strategy, repeat experiment many times) to get rough feel for the average performance.

→ Reply

Xellos

3 months ago, hide # ^ |

-16

You're technically never learning the distribution since you know what distribution it is (formula), you just need to estimate its parameters.

→ Reply

arvindf232

3 months ago, hide # ^ |

+22

this is wrong conceptually because the problem statement says that p is sampled adverserially, not just uniformly over [0,1]. You should theerfore assume a clever adversary that takes p to be some werid distribution, which likely incorporates your best strategies.

For this reason, it is also not easy to run a simulation. You need to also train the adversary weights during it. It is closer to solving a probablistic game.

→ Reply

Xellos

3 months ago, hide # |

← Rev. 2 →

-6

In the limit of small $$$\epsilon$$$ (and therefore large number of tosses), we should have the expected number of times the result is heads $$$\approx pN$$$ by treating each toss as adding $$$+1\cdot p+0\cdot(1-p)$$$. We can just keep guessing $$$E_h/N$$$ until the oracle gives thumbs up. Dunno if it's optimal.

→ Reply

cneisler

3 months ago, hide # |

← Rev. 2 →

-16

deleted

→ Reply

arvindf232

3 months ago, hide # ^ |

I don't understand is this intend to be a lower bound or upper bound.

The expressions of taking it as a product is very suspicious. It is wrong for both upper bound or lower bound. For lower bound, you cannot assume the opponetn follows your strateyg of always guessing near the posterior mode. For upper bound, you cannot assume uniform prior (see comment above).

If you always guess near the posterior mode (which is what the product implies using only what you said) there are very nontrivial correlations and your bound is definitely wrong.

→ Reply

arvindf232

3 months ago, hide # |

+14

Let's make useless guess for $$$k$$$ rounds, then with probability at most $$$1-\frac{1}{k}$$$, the $$$E_p$$$ is within $$$(p-x,p+x)$$$ (approximateely, weird things happen when p is close to 0 or 1) where $$$x = g(k)/\sqrt{k}$$$ where $$$g(k)$$$ inverse cdf of normal, roughly $$$g(k) = \sqrt(log(k))$$$. I am too lazy to formally combine the expression or deal with $$$p$$$ being close to $$$0$$$ or $$$1$$$, but you see that another $$$O(1/sqrt(k)\epsilon)$$$ that partitions this desired range would be sufficient. Optimising (only caring about asymtoptics) $$$k + \frac{1}{\epsilon \sqrt{k}}$$$, you get something like $$$\epsilon^{-\frac{2}{3}}$$$ times log like slow glowing factors. (Carefully bound the exponentail taiil for a formal proof, ask LLMs I guess)

The converes is asymtoptically clear, well, intuitively. Using only $$$k$$$ samples, it is not possible to reliably distinguish any probability $$$p$$$ between $$$(p - c\frac{1}{\sqrt{k}}, p + c\frac{1}{\sqrt{k}})$$$ for some small constant $$$c$$$, so anything in this region is basically just random guesses, so we can assume we need this length divided by $$$\epsilon$$$ guesses.

If you still want to go for constnat factor problem, you should probably set explicit values and set a heuristisc contest typed problem. I am not believing it is particularly interesting though.

→ Reply