Codeforces Marathon Round 1 — discussion

Правка en3, от Gassa, 2016-06-23 17:17:17

The Codeforces Marathon Round 1 is over (results comment). As the final solutions are being tested, I think many participants will want to share ideas and learn alternative approaches. I'll start with the ideas I tried myself, both successful and not; as I've seen in the submissions, the contestants have more ideas, but I hope they will share themselves. For each of the solutions below, the numbers in square brackets are minimum, mean and maximum score when running on 1000 tests locally. I have to note in advance that the constants and technicalities in the solutions are not considered optimal: the score just shows approximate relation between ideas and may often be improved a bit. -----

Solution 1 [4099 4159.699 4288]: random.

Just print 5000 random bits 100 times.

Solution 2 [4160 4226.294 4365]: last bit memorization.

The last checked bit is always wrong. For example, if the answer for an attempt is the number 4050, it means that the error number 2000 happened exactly on position 4050 of our string. Let us fix the right value of this bit. When subsequent random strings are generated, all fixed bits have their known values, and other bits are generated randomly.

Solution 3 [4182 4291.374 4453]: find several bits.

This is a two-part solution. In the first part (90 attempts) the attempts go in 45 pairs. The first attempt of pair i is generated randomly (except for the fixed bits which we already know). The second attempt of pair i differs just in i-th bit and in the last checked bit.

What information can we learn from such a pair? For example, let the first attempt give the result 3980, and the second one (with bits 1 and 3980 changed) resulted in 3975. This means that, firstly, bit 1 was right in the first attempt (because it got further). Secondly, error 2000 of the second attempt is error 1999 of the first one, so between bits 3975 and 3980, there are no errors.

Similarly, if the result of the second attempt is greater, for example, 3981, we know that bit 1 was right in the second attempt, and also that there are no errors between 3980 and 3981 (this time, it does not help because there are no bits either).

For a pair of attempts, we find the exact values of three bits on average.

In the second part of the solution (the remaining 10 attempts), let us just generate random strings except that we take fixed bits into account. There are 135 fixed bits on average, and one more after every attempt.

Solution 4 [4260 4369.927 4498]: find several bits symmetrically.

This is a two-part solution. In the first part (90 attempts), the first attempt is random, and each subsequent (i-th) attempt takes the previous one and inverts bit i and the last checked bit. As in solution 3, observing the change in answer, we can not only find the true value of bit i for future use, but also find two bits on average around 4000. However, now we find three bits in each attempt, not in each pair. Note that we can't put the right values of the first 89 bits right away, so the bits we learn at the end are more symmetrically distributed around 4000.

In the second part of the solution (the remaining 10 attempts), let us just generate random strings except that we take fixed bits into account. There are 267 fixed bits on average, and one more after every attempt.


That's it for my solutions which find some particular bits. Looking at the preliminary scoreboard, we can conclude that more than half of the participants invented something better. The useful heuristic to take to other solutions is to always memorize that the last checked bit is wrong.


Solution 5 [4429 4533.129 4671]: segment inversion.

Split our 5000 bits into segments of length, say, 45. For the first attempt, generate a random string, and in each subsequent attempt, invert the next segment. After inverting, we compare two numbers and know for sure which version was better: inverted or non-inverted; choose the best of the two and proceed to next attempt.

Why does this work at all? Intuitively, the average deviation on a segment is large enough, so there will be enough segments for which inversion changes the answer much: for example, if a segment contained 19 right bits out of 45, after inverting, there will be 26 of them. If, on the contrary, we transformed 26 into 19 — return the 26 (and the previous result, too) by inverting back.

The constant 45 is a rough estimate; it seems that many participants chose 40 in similar-looking solutions.

Solution 6 [4399 4582.368 4723]: segment inversion with recycling.

Put the 4000 first bit positions in the "present" pool, and the remaining 1000 positions in the "future" pool. For the first attempt, take 79 random bits from the "present" pool and invert them all. If the result changed by more than 10, it means that the inverted "segment" has a large enough deviation. Let's use it: fix all these 79 bits in the better of the two states (either leave every bit as it is, or invert every bit). Otherwise, let's also put the bits into the better of two states, but instead of fixing the bits, put them into the "future" pool.

When the "present" pool is empty, look at the "future" pool, move all bits visited at least once (bit 4999 is most likely not visited) to the new "present" pool and continue.

As we now reuse bits which don't give us much profit when fixed, we can take longer "segments": their length is 79 instead of 45 in solution 5. This means that the average deviation in a "segment" also increased.

The constants 79 and 10 were chosen by a small search.

Solution 7 [4539 4702.079 4878]: segment inversion with recycling and memorizing.

Do everything as in solution 6, but recall the idea from the previous line of solutions: the last checked bit is always wrong. The profit from this idea has almost no dependence on large segment inversion.

In this version, it made sense to make the segment even longer: the constants 79 and 10 changed to 99 and 18.


These were solutions with segment inversion. The best of them achieves the rank of 20-30 in the preliminary scoreboard. The weak sides are that we change only a small portion of the string, and as a result get only few bits of information on average. Additionally, we fix the whole good segment forever. Which of these can be improved and how is a story for some of the top performers to tell.


Solution 8 [4251 4403.645 4594]: discrete probabilistic in each position.

Let us make 99 random attempts. In each of the 5000 positions, memorize the probable bits. For each attempt with result at least 4000, mark its bits up to the last checked one as probable in their positions. For each attempt with result less than 4000, invert it and mark inverted bits up to the last checked one as probable.

In the last attempt, for each position, choose the option which was marked more times as probable.

Solution 9 [4433 4601.678 4773]: discrete probabilistic in each position with memorizing.

Let us try to improve the previous solution.

First, recall that the last checked bit is always wrong, and fix all such bits.

Second, learn to use not only the sign of the result (trivially: is it less than 4000), but also the magnitude. For example, if 2000 errors are somehow distributed among 4001 bits, it is less informative than 2000 errors distributed among 4100 bits.

Taking fixed bits into account, we can translate each attempt into a piece of information of the form "there are 1999 errors in t bits". In each of these t bits, record that the probability of it being right is 1999 / t. In the end, just take an average of all such records.

In the last attempt, for each position, choose the option which turned out to be more probable.


Around one third of the contestants implemented something better than the described solutions which calculate probabilities separately in every position.


The next idea is full inversion. Let us use attempts in pairs, and the second attempt of each pair be full bitwise inversion of the first. Each such pair of attempts gives information in the form "in the segment from lo to hi, there are exactly err errors". Indeed, let the first attempt of a pair give result 4015, and the second 3980. In total, there are 4000 errors in the segments [1, 3980] and [1, 4015]. Moreover, there are exactly 3980 errors in total from 1 to 3980: each bit was once in each of the two possible states. So, there are exactly 20 errors from bit 3981 to bit 4015.

We can now perform different operations with these segments: intersect them, search for a string conforming to all gathered information, if it is too hard to find, search for the most conforming string. My solutions with such segments get mean results between 4400 and 4500, but are already technically involved. Sure one can do better.


A bit of problem history. Initially, it was designed as a hard probabilistic problem for a contest with ACM ICPC rules. The plan was to try several approaches and see if the simple and weak ones can be separated from thoughtful and strong by choosing the appropriate constraints. However, the first few solution ideas refused to line up in an order resembling their complexity, so I had to put the problem on hold. On the other side, this problem fits well in a contest with partial scoring since there are a few quite different approaches which are easy enough to invent and implement.

Last but not least, I owe my thanks to Natalya Ginzburg (naagi), Andrei Lopatin (KOTEHOK) and Mike Mirzayanov (MikeMirzayanov) for their help in preparing and running the contest.

Теги marathon

История

 
 
 
 
Правки
 
 
  Rev. Язык Кто Когда Δ Комментарий
en3 Английский Gassa 2016-06-23 17:17:17 56 added results comment link
ru3 Русский Gassa 2016-06-23 17:16:37 67 added results comment link
en2 Английский Gassa 2016-06-22 12:05:33 9 added cut
ru2 Русский Gassa 2016-06-22 12:05:09 9 added cut
en1 Английский Gassa 2016-06-22 12:01:24 10048 Initial revision for English translation
ru1 Русский Gassa 2016-06-22 12:00:40 10057 Первая редакция (опубликовано)