Hello codeforces,
I have been offered a challenge by a friend, I was wondering if any algorithms could help succeed in this challenge.
Given a fixed set S of ~100000 unique 32 bit numbers, how would you devise an algorithm such that a human can compute on pen and paper whether a given number x is in S?
The algorithm should be feasible to be memorised given a year of practice.
After you have practiced your algorithm, you will be placed in a room with nothing except a blank piece of paper and pencil. There is an examiner in the room who will dictate a number to you, you have 60 seconds to determine whether that number was part of S. The set S of 100000 numbers is not available as a printout in the room.
Find the lagrange polynomial of that integer sequence If you know the sequence beforehand.
Yes, the sequence is known beforehand. but wouldn't you need 100000 coefficients to represent the polynomial?
Very interesting. I have ideas for ~10000 that can work with an year of practice, but 100,000 sounds to be pushing the limits for me. I am assuming all the numbers are randomly drawn from a uniform distribution and finding any pattern among the bits is pointless, otherwise there might be ways to compress it.
Well, for ~10000 numbers, could you just directly learn all the numbers using a spaced repetition system like Anki? Though usually Anki works for question answer pairs, not sure how you would construct the flash cards in this case.
There is major system for encoding the numbers and there is Memory Palace technique for storing them. There is also Ben system which lets you encode 10 binary digits into a single image. Assuming you can do that, you have ~30K images, which can be stored in around 15000 memory locations
There are some dedicated people who memorize very huge number of digits of Pi for example, with this technique. Last i checked the record was 100K digits (which is equivalent to about 30K images as well).
But I would only advise someone very serious to go for this approach. I am someone fairly acquainted with the field of memory sports, and while it's not rocket science, it would need lot of effort.
If you memorize everything linearly in a sorted order, you will have to do a binary search on your memory palace for the element, which won't be trivial but ok with little rehearsal.
And yes, they are from a uniform distribution. Though it is a fixed set of random numbers that you know at the start of the year.
The record for remembering the digits of pi is ~60k digits, in our case we have 100k numbers, and most numbers will have 10 digits, which is 1 million digits in total, so the best human can remember only 6% of the digits we will try to remember. In terms of designing an algorithm for helping you remember, what comes to my mind is designing some hash function in terms of the sum of digits of the number or the remainder of the number by some k, but such algorithm will probably have many exceptions that you need to remember and 60 seconds is not enough time to compute things with paper. So you will need to try “all” easy to compute hash functions and check which one has the fewest amount of exceptions that you need to remember, but I dont think such an hash function exists, the hash function will likely have at least 100k false positives, which is more numbers than what you need to remember to begin with. I suppose you will only be able to improve the odds of guessing it right by memorizing the frequency of each digit in each position for example.
From https://arxiv.org/pdf/1912.08258.pdf: "theoretical lower bound for an approximate membership data structure with a false positive probability e is $$$−\log_2 e$$$ bits per key"
But as VLamarca mentions, the world record for memorizing pi is 60k digits, about 2e5 bits, which is only enough for 2 bits per element. Therefore, a world-class human can at best achieve 25% false positive: 87.5% overall accuracy if the queries are 50% true/false. This ignores the practical difficulty of computing such a hash by hand: realistic implementations like bloom filters and xor filters would be even less efficient.
Thanks for that paper, it is intriguing. It seems that bloom / xor filters are very apt for this task.
One thing that might help is that one "unit" of memory for a human is different from a unit of memory for a computer. For example, the concept of a horse is very easy to think of for a human, but perhaps not for a computer. Therefore, humans have created systems enabling them to quickly remember lengthy things like the complete order of a deck of playing cards, but first having to commit a long time to putting a mapping into their long term memory.
Here is some info on a 10000-image mnemonic system: https://artofmemory.com/wiki/10,000-Image_Number_System?_ga=2.90694682.1662736622.1612875155-891935763.1612875155
Maybe this method can help on practically using the bloom or xor filter? I will think about it further
True, also I read the news story about Lu Chao memorizing pi: seems like he only got one digit wrong out of 90k, so effectively he was able to push it to 3e5 bits. Amazing.