AI outperforms IMO gold medalists in geometry (this isn't surprising, relevant details in the post)

№	Пользователь	Рейтинг
1	tourist	3985
2	jiangly	3814
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3517
7	Radewoosh	3410
8	hos.lyric	3399
9	ecnerwala	3392
9	Um_nik	3392

№	Пользователь	Вклад
1	cry	169
2	maomao90	162
2	Um_nik	162
4	atcoder_official	161
5	djm03178	158
6	-is-this-fft-	157
7	adamant	155
8	awoo	154
8	Dominater069	154
10	luogu_official	150

EDIT: I missed a major point from the paper. Go see shiven's comment below!

Paper Wu's Method can Boost Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry claims that combining algebraic methods (Wu's method) with synthetic methods (AlphaGeometry) achieves state of the art results in IMO geometry. They used the same dataset of 30 IMO geometry problems which was used in the AlphaGeometry paper. AlphaGeometry solved 25 / 30 of the problems correctly and the "new model" solves 27 / 30. However the new model is just combination of AlphaGeometry and some algebraic bashing so it couldn't have done any worse than AlphaGeometry.

I don't find this result indicative of any new AI capabilities and for example don't feel like AI solving IMO Number Theory problems is any closer than before reading the paper. The fact that some problems which are hard to solve synthetically are relatively easy to bash algebraically isn't really surprising. Nevertheless I do find it interesting that we have an AI model that can solve IMO geometry problems (at least from this set of problems) better than "IMO gold medalist".

Комментарии (18)

Написать комментарий?

123gjweq2

8 месяцев назад, # |

+48

first they came for our geometry problems...

→ Ответить

AvaraKedavra

8 месяцев назад, # ^ |

then they came for our adhoc problems...

WeaponizedAutist

-30

who cares

AI waifus when?

Guuber

+33

What good are AI waifus if they can't beat me at IMO number theory

-34

you are just a blue, intelligence should not be very high on your priorities

+21

:D. That logic doesn't make any sense. If AI is worse than me, I can complain that it's too bad. The complaining makes a lot more sense than if I was red.

If someone is very poor they have much more reason to say "I hope my future partner isn't even poorer than me!" than a billionaire would

-53

redditor detected, opinion rejected

Red0

you are an inactive purple :)

shiven

+25

Imagine not tagging :|

This definitely doesn't mean that AI has 'conquered math', or even different domains of IMO. You're correct that the work speaks for the narrow domain of Geometry in math competitions.

The important highlight is the potential of symbolic methods and their huge gains in efficiency. They manage to solve 21/30 problems, close to the performance of the average silver medalist, under a time limit of 5 minutes on a laptop CPU. Most take under 5 seconds. I invite you to contrast that to the amount of compute it takes to train and infer with LLMs in the pipeline (e.g. AlphaGeometry) and you'll be surprised :)

That is very true! I mostly saw people reacting to the 27 / 30 number and maybe wrote the post too much from that context. I should have written the post from a more neutral viewpoint!

(Also didn't realize I could / should check if the authors of the paper were on codeforces...)

hashman

Also, geometry is much more mechanical than algebra, combinatorics, or NT. In my opinion (from my fairly limited MO experience), the order this will go is: geometry -> algebra -> NT -> combinatorics, because combinatorics is the least mechanical out of the four. So, when AI solves combinatorics is when we should really be worried.

Indeed, there are several things that make Geometry a good target for mechanical provers. We highlight some of the reasons for this in our paper.

As for ranking others, although I see where you're coming from, I don't think I can make a confident guess sadly. There may even exist pecularities that make some parts easier. For example, I can see many combinatorics problem which address a small enough finite set to be brute-forceable by computers, without using any 'reasoning' as such. Consider something about counting the number of ways to do X such that property Y is satisfied. I would assume quite a few LLMs would be able to write brute-force code that iterates over all ways to do X and checks Y. Similar to how ChatGPT can write inefficient solutions for CP problems. Sure, the computer didn't solve it in the way you'd expect. But MO problems just weren't designed to handle this amount of compute. Humans need to resort to elegant reasoning to solve those puzzles, but computers just might not need to :) Of course, that is a very specific example that you can't extend to different problems, but it's just meant to point out that the computational ways to address these problems might be different from how humans think about them.

Good points. But the thing I was talking about was problems of the form "prove something general about this infinite set of objects (could be anything, sets, numbers, boards, etc)", for which there is no finite case check approach, and you're forced to actually reason your way to the solution.

ShaoNianTongXue5307

+14

If I can calculate a billion times a second, I can get IMO gold medal also.

SuperJ6

-8

You do, just you make poor calculations.

You can write code to do that, and there's a lot of money waiting for you if that code works for a bunch of problems :D

DottedCalculator

+10

If someone tried to use this to solve IMO 2023/6, how many points would it get?

Блог пользователя Guuber