You can watch the lecture on Youtube: https://youtu.be/0r2D32esF3Y. I will do a second part soon. Some problems are quite vague, it's a nature of this topic.
- Warm-up: You toss a coin till you get tails. How many tosses there will be, on average?
- X or smaller — There is a hidden number $$$X$$$. An interactor repeatedly gives you a number, either $$$X$$$ or something smaller than $$$X$$$. All numbers are positive integers. When can you stop and say that you are (almost) certain what is the value of $$$X$$$?
- Line through N/4 points — Given $$$N \leq 10^5$$$ points, find a line that passes through the maximum number of points. It's guaranteed that the answer is at least $$$N / 4$$$.
- GCD (364D - Ghd) — given a set of $$$N \leq 10^6$$$ numbers, each up to $$$10^{12}$$$, find the maximum possible number that is a divisor of at least half of given numbers.
- ACTG prefix — Guess a hidden string $$$S$$$ with characters A, C, T, G. You can choose some string and ask if it's a prefix of $$$S$$$. The length of $$$S$$$ is at most $$$10\,000$$$ and you can ask up to $$$25\,000$$$ queries.
- How many tails you will get after tossing a fair coin $$$10^6$$$ times? It should be around $$$500\,000$$$, but how far away from this number can you realistically/plausibly get?
- Don't use a fixed seed in Codeforces or Topcoder because somebody can hack/challenge you.
RAND_MAX
in Codefoces is around $$$30\,000$$$, so use your ownrand()
in C++, same forrandom_shuffle()
. More here: https://mirror.codeforces.com/blog/entry/61587
More (and harder) problems can be found here: https://mirror.codeforces.com/blog/entry/51244
And if you want to learn more about expected value: https://mirror.codeforces.com/blog/entry/62690
Save trees: https://youtu.be/TnCVAEsYGKs #TeamTrees
EDIT, part 2: https://youtu.be/GS2MxmorEzc
- Catch 'em all — When you encounter a Pokemon, it's random out of $$$N$$$ types. How many Pokemon will you encounter (on average) before seeing all $$$N$$$ types? Estimate this value.
- Birthday paradox — How many encounters before we see the same type of Pokemon twice?
- Hash collision — what is the probability of hash collision in problems like "given queries, check if this substring is equal to that substring" compared to "given a bunch of strings, find a pair of equal strings". The latter has much bigger probability of collision. Why?
- How to randomly shuffle an array?
- Repeated binary search — Find max element in a hidden sequence by asking queries "is $$$a_i$$$ greater than $$$x$$$?" faster than $$$\mathcal O(n \log n)$$$, https://mirror.codeforces.com/blog/entry/62602
- Bonus: estimate $$$\pi$$$ using randomized algorithm.
3 and 4 can be solved deterministically using pigeonhole principle
How? What complexity?
Sorry I was wrong
It's $$$O(n^2)$$$ if you do that for every five consecutive points. And if you do it only for one five of points, you won't necessarily find the answer.
Agree, I misunderstood
There's also a nice classical problem which is given matrices $$$A, B, C$$$, check if $$$A * B == C$$$.
When I first encountered this problem (Matrix God) in this contest I was puzzled by the solution of the editorial. Then I got to know that it's a named algorithm (Freivalds algorithm) and is a part of broader class of techniques called fingerprinting (polynomial identity testing, string equality testing and pattern matching) . I read about this in Randomized Algorithms by Motwani, Raghavan (check out page 162).
EDIT — Sorry I didn't realize that it was copyrighted. I just took the first link from the google search. I have removed the link now.
given a tree. find minimum count of edges need to add to graph to contains no bridges. and need to output this edges.
count is (leaves+1)/2
do{ connect leaves at random }while(graph contains bridges)
Ummm... are you sure about your certificate?
probably...
seriously, i dont have counterexamples for big number of iterations
I think it should terminate in a constant number of iterations, at least intuitively. Each edge should have probability less than 1/(leaves) of being a bridge after the edges are connected (assume there are no degree two vertices)
test: n = 6, m = 5, edges = [(1, 2), (2, 3), (2, 4), (4, 5), (4, 6)]
We can add [(1, 3), (5, 6)] and have bridge (2, 4)
But we can add [(1, 5), (3, 6)] and have no bridges
Obviously random doesn't work :(
seems like you dont understand how construction
do while
works :(expected value of number of iterations on this (obvious) test is 1.5
Well, maybe you mean
I don't have another idea. Anyway there is a counterexample smth like this I think.
[(1, 2), (2, 3), (1, 4...k), (3, k+1...2k)]
By the way, this problem can be solved deterministically in O(|V| + |E|).
UPD. Oh, my counterexample is bad :) Bad I'll try to find another, I can't believe this solution will work
ans = min(ans, added_edges);
is meaningless after i saidcount is (leaves+1)/2
(count is ans)ok, more code of what i mean:
Mark all leaves. If the number of leaves is odd, additionally mark an arbitrary unmarked vertex. Root the tree arbitrarily. Do a preorder traversal. Let $$$v_1, \dots, v_k$$$ denote the marked vertices in the order in which they appear in the traversal. For $$$i = 1, \dots, \frac{k}{2}$$$ connect $$$v_i$$$ with $$$v_{i + \frac{k}{2}}$$$.
To show that the resulting graph contains no bridges, take an arbitrary edge $$$e$$$ of the original tree. Removing $$$e$$$ splits the tree into two trees $$$T_1$$$ and $$$T_2$$$. W.l.o.g assume that $$$T_1$$$ contains at most $$$\frac{k}{2}$$$ marked vertices. Note that the vertices of $$$T_1$$$ appear contiguously in $$$v$$$ if we think of $$$v_k$$$ and $$$v_1$$$ as adjacent. As $$$T_1$$$ contains at most $$$\frac{k}{2}$$$ vertices, every edge we added to a vertex of $$$T_1$$$ connects it to a vertex of $$$T_2$$$. As $$$T_1$$$ contains at least one leaf (and hence at least one marked vertex), we added an edge between $$$T_1$$$ and $$$T_2$$$, so $$$e$$$ is not a bridge in the resulting graph.
Thanks for the lecture — it's very comprehensive and accessible even for not-really-clever-people like me. :)
Today I solved my first problem by randomization — Troublemakers. I couldn't believe my eyes when I got AC xD
I'm very much looking forward to the second part of the lecture.
This is actually very common that we do something randomly, show that the expected value is something, say $$$X$$$, and thus sometimes we must get value not worse than $$$X$$$. I think there are some well known $$$k$$$-approximations but I don't remember examples :D
Well-known: given $$$3$$$-sat (each clause contains distinct variables), find in polynomial time variable assignment satisfying at least $$$\frac{7}{8}$$$ of clauses.
Just assigning each variable to true or false with probability 0.5 gives an expected $$$\frac{7}{8}-\text{approx}$$$ algorithm.
Suppose the input 3-SAT formula has $$$n$$$ variables and $$$m$$$ clauses let's call the random variable $$$W$$$ as the number of clauses which are satisfied in a random assignment. And let $$$X_i$$$ denote the indicator RV such that it is 1 iff $$$i^{th}$$$ clause is true which happens with probability $$$1 - \frac{1}{8}$$$. Now we can see that $$$W = \sum_{i=1}^{m} X_i$$$. We are interested in finding $$$E[W] = E[\sum_{i=1}^{m} X_i] = \sum_{i=1}^{m} E[X_i]$$$ (due to linearity of expectation)
Hence we get $$$E[W] = \frac{7}{8} \cdot m$$$
Let's call the maximum number of clauses which can be satisfied for the given instance as $$$OPT \leq m$$$ which gives us the following relation. $$$E[W] \geq \frac{7}{8} \cdot OPT$$$
In fact this can be derandomized using the method of conditional expectations. Check out page 108 in Design Of Approximation Algorithms by Williamson and Shmoys
Yes, you are correct. It's even possible to make estimation on the number of random iterations required.
You mean to run the randomized algorithm $$$t$$$ times and take the maximum value and try to bound the deviation of this modified algorithm from the mean ?
Can someone explain why the probability of success of the chosen point is 1/16 in the 3rd problem of first lecture
At least $$$N/4$$$ points are on the optimal line, so probability of one point on the line is at least $$$N/4$$$ and probability of both points on the line is at least $$$N/16$$$.
Solution with proof for part 1, 5th problem.
Let's find letters one by one. Notice that in worst case you should spend 3 guesses to determine a letter, if first three guesses are wrong then the remaining letter is correct. If you pick ACTG letters randomly, probability that you spend one query is 1/4, two queries 3/4 * 1/3 = 1/4, and three queries is remaining 1/2 probability. Thus expected number of queries for one letter is 2.25, with variance 11/16 = 0.6875. (Computed using well known formulas).
Let q be a random variable — number of queries per determining one letter.
S be sum of queries you spent — sum of all qi.
n be 10000.
Then (S — n * E[q]) / sqrt(n * var(q)) follows standard normal distribution (0 mean, 1 variance).
Probability[ (S — n * E[q]) / sqrt(n * var(q)) < 30 ] is almost one.
Where 30 * sqrt(n * var(q)) + n * E[q] = 24987 ~ 25000
So probability that total number of queries S is less than 25000 if almost one.