4orty3hree's blog

By 4orty3hree, history, 5 months ago, In English

https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/

It's a bit scary to think that once it gets good at competitive programming, it could easily reach a 2100+ rating.

I'm curious about how programmers can keep their skills sharp in this AI-augmented world we're moving towards. How do you think we can stay ahead? I'd love to hear your thoughts and opinions on this.

  • Vote: I like it
  • +18
  • Vote: I do not like it

»
5 months ago, # |
  Vote: I like it 0 Vote: I do not like it

ruh roh

»
5 months ago, # |
Rev. 3   Vote: I like it +10 Vote: I do not like it

Considering aplhaCode was able to solve div2D at-most and alphaProof solving IMO P6, this is incredible advancement. I see nor mentioning " Of course, the ideal scenario is to solve the IMO Grand Challenge." about 8 months ago and we clearly have more or less acheived the ideal scenario. Really interested to hear nor's thought on this.

  • »
    »
    5 months ago, # ^ |
    Rev. 3   Vote: I like it +39 Vote: I do not like it

    My thoughts are more or less what bronze_coder mentioned.

    The approach is sensible, the only use of an LLM (the natural language kind that we all know) was to translate problems to formal language (Lean) in order to generate training data for their Alpha* style of algorithms (the RL algorithm used in AlphaZero, AlphaGo and so on).

    This sounds like a much more promising approach for mathematical theorem proving than they tried with their previous attempts (and my hunch is that their approach that they mention in the end, possibly to keep investors happy with Gemini, using Gemini to directly solve problems, is probably not going to end well unless they do something miraculous). I hope they can get past the huge compute barrier they have in front of of them — since their approach to "creativity" seems to be brute forcing approaches with certain heuristics until you find a solution, and hence throwing compute at the problem. The approach they took with AlphaZero is to make that heuristic finding as good as possible and reduce the branching — something like that should help here, but I am not very confident on how much compute it can actually save.

    In any case, as I mentioned in my other comments on AI solving problems, I feel that a symbolic approach (the approach they have is an example, albeit a naive one) is the only sane way to go about it, so finding efficient heuristics automatically is the main challenge (so we are sort of back to where AI was 5 decades ago, when Lisp was supposed to be the language of AI, except that now we have a lot of compute and efficient matrix multiplication).

    AlphaGeometry is successful precisely because it circumvents the computational barrier, because the generative part involved in geometry is much less than in algebra or NT and geometry is much more mechanical and can be tamed much more easily.

    Coming to the part about the IMO Grand Challenge. The IMO Grand Challenge wants an automated system to solve all problems within contest bounds, and qualifying for a gold medal. There are two reasons why this AI does not qualify — one, the compute time (3 days for non-geo), and two, the inability to solve combinatorics. I've already spoken about compute a lot.

    I feel that deriving structured insights from experimentation is a crucial part of combinatorics at the IMO. And this part seems to be missing from all approaches right now. One of the ways I can see this happening is to somehow capture motivation as an auxiliary part of the solution.

    Lean proofs do not do this as a first class citizen, but I guess if AlphaProof is augmented with an LLM trained on symbolic chains (of the form statement -> examples -> conjecture -> proof/disproof of conjecture, or even more involved chains with multiple conjecture nodes) instead of natural language, that can come up with conjectures from a next-token POV (this might be asking for too much), then this can work.

    Another approach is to have one model (the "generator") that generates exhaustive examples from the statement, one model (the "observer") that studies the generated examples to find properties of the examples and takes the intersection of the properties, and one model (the "prover", or just AlphaProof) that proves/disproves the conjectures. Of course, the details of the models (whether to use a LLM or a traditional model) are up to them to experiment with — however, each part must be formally verified.

    As a natural generalization, it would greatly help the AI guys if they identify certain patterns in human problem solving — naturally because humans have very limited working memory so our reasoning is optimized for low context and high efficiency. Patterns like simplification/specialization, coming up with conjectures, wishful thinking and other such common patterns that one can find in books like Polya's How to Solve It (and my blog) should probably be captured during training these models, speaking from a retired MO contestant's perspective. One way to capture specialization is just substitution from a symbolic perspective. A generalizer model can be used to generalize the Lean proof and check its validity. Of course, after we have all these models, the control flow is just the traversal of the graph where these models are all nodes.

  • »
    »
    5 months ago, # ^ |
      Vote: I like it -15 Vote: I do not like it

    It's not exactly as big as you make it sound... Of course solving P6 that had 5 or 15 solves in contest is impressive! ...on one side

    On the other side, it's too specific to the problem type. It wouldn't be able to solve "any" or 'almost' "any" p3/p6 (even if we exclude combinatorics). Opinion I've heard from one IMOer is that this p6 was probably one of most comfortable problems for AI to deal with, thus being more of a lucky nice accident than actual fact showing something.

    Also since it wasn't mentioned anywhere that p1 took less than 3 days to be solved, I assume it indeed took that much time (or around that, otherwise they would state that AI was able to solve it under 4.5 contest hours), which, once again, isn't enough to state AI can actually perform on silver/gold (or any other than HM) level because if it was restricted to 4.5 contest hours it'd have 000700 score (?)

    And lastly, it's performance in combinatorics needs no comments...

    Thus, imo at it's current level it can get atmost bronze (if contest has 2 geometries and cutoffs are kind enough) if you take into account actual contest conditions.

    Which is still impressive, yes, just not too big advancement yet

»
5 months ago, # |
  Vote: I like it +70 Vote: I do not like it

The AI seems to be unable to solve combinatorics problems, and it took 3 days on non-geo.

  • »
    »
    5 months ago, # ^ |
      Vote: I like it +11 Vote: I do not like it

    I think we should focus on the first part of your sentence about the model being "unable" to solve combinatorics problems.

    As for the second part, with better hardware and parallelization and other techniques it should be possible to reduce the 3-day time to 4.5 hours. it doesn't seem like an issue.