Both OpenAI and Deepmind are excited to announce/share

Блог пользователя jonathanirvings

Автор jonathanirvings, история, 8 месяцев назад, По-английски

These two tweets, beginning with "I'm [really] excited to announce/share", were posted 5 minutes apart:

TL;DR Both Google Deepmind and OpenAI are claiming their models achieved gold-medal level performance at ICPC WF 2025.

+146

jonathanirvings
8 месяцев назад
30

Комментарии (30)

Написать комментарий?

jonathanirvings

8 месяцев назад, скрыть # |

Auto comment: topic has been updated by jonathanirvings (previous revision, new revision, compare).

→ Ответить

DNR

8 месяцев назад, скрыть # |

-14

claim: We'll have AI that beats tourist in 3 out of 10 div 1s by this day next year.

anyone willing to take me up on this bet? stakes: 100 usd (crypto)

→ Ответить

cosenza

8 месяцев назад, скрыть # ^ |

+55

the new kasparov vs deep blue?

→ Ответить

elnazar

8 месяцев назад, скрыть # ^ |

-15

send me the link.

→ Ответить

discontinuous

8 месяцев назад, скрыть # ^ |

-14

AI always wins. But as I'm just a broke kid so I can't risk it :(

→ Ответить

fisher199

8 месяцев назад, скрыть # |

+12

It seems the claim is supported by the ICPC foundation. But in any case, in the OpenAI link it states that "Demonstrating the power of AI under ICPC oversight, OpenAI's models successfully solved all 12 problems", so I wonder what the exact oversight entailed...

→ Ответить

DNR

8 месяцев назад, скрыть # |

+25

Also note that OpenAI's model(s) solved all 12 problems, making this (if im not mistaken) the first major cp event where AI has outperformed all humans (present at the event) at cp (it fell short by 5 humans at this year's IOI and by 1 human at the atcoder heuristic world tour finals). This is probably due to ICPC problems having an AI-friendly "flavour", but it's nevertheless a landmark event.

→ Ответить

kwangg

8 месяцев назад, скрыть # |

Interesting. Sounds like the Deepmind team was composed of Deepmind humans using Gemini, to solve 10 problems. OpenAI sounds like it was a completely automated system that solved 12 problems?

→ Ответить

gultai4ukr

8 месяцев назад, скрыть # ^ |

I would actually conclude an opposite thing based on the press-releases wording

→ Ответить

gultai4ukr

8 месяцев назад, скрыть # ^ |

+23

Also, phrase "OpenAI team was not limited by the more restrictive Championship environment" is pretty sus and the fact that GDM did basically mirror of WF while OAI was operating through "special AI test environment added to the Local Judge" sounds like OAI was trying to overshine GDM by all means including putting themselves into much better starting position resource- and limitation-wise

→ Ответить

oToToT

8 месяцев назад, скрыть # |

← Rev. 2 →

Still wondering about the cost and when models with this level of capability will be generally available.

→ Ответить

comingsoon.cpp

8 месяцев назад, скрыть # ^ |

← Rev. 3 →

-34

Too expensive for public use

→ Ответить

jomathyc

8 месяцев назад, скрыть # ^ |

It'll always be optimized later

→ Ответить

GPT4-B

8 месяцев назад, скрыть # ^ |

-27

11 out of the 12 solutions are generated by gpt-5 which is publicly available....

→ Ответить

bashkort

8 месяцев назад, скрыть # |

+70

IOI > ICPC

→ Ответить

eduardische

8 месяцев назад, скрыть # |

← Rev. 4 →

+12

Out of curiosity, is there submission limit at ICPC? At IOI you're limited to 50 submissions no matter what. At ICPC if there is no submission limit, you could make 1000000 submissions, completely send your time to the shadow realm but still win on solved problems.

Also, I don't believe ICPC scrambles tests, and you are told which test number has the first failure. So one could start extracting characteristics of a particular test (e.g. hash all of the input, get it out through WA/TLE/AC statuses and then hardcode in the code) and make sure that for that test the logic isn't modified further from AC and keep trying different things for next test.

So I'd be really curious in full results (including time) and also how affected would AI be if you add a couple of new tests.

→ Ответить

ecnerwala

8 месяцев назад, скрыть # ^ |

+76

You can see the submission count in the tweets: OpenAI passed on their first submit for 11/12 problems and used 9 submits on the last. Gemini did similarly, using 17 submissions for 10 problems.

Also, ICPC does not show you test number of first failure; the information is only the status code of WA/TLE/AC (this differs from most other ICPC-style platforms, including ICPC-style CF rounds and UCup).

I think it's pretty clear the agents were not cheating in this way. The question I do have is how many solutions were generated, and did the agents locally stress test?

→ Ответить

eduardische

8 месяцев назад, скрыть # ^ |

Thanks! You can’t view replies to tweets without an account so didn’t see it for OpenAI.

My vague memory of 10 years ago recalled seeing judgements like WA42, but maybe my memory confused it with some other competition or things changed.

→ Ответить

TwentyOneHundredOrBust

8 месяцев назад, скрыть # |

So OpenAI 12/12 while GDM at 10/12. It's over. Sam is going to be the emperor of the universe.

→ Ответить

triple__a

8 месяцев назад, скрыть # |

+16

really curious whether GPT-5 (the public version, maybe GPT pro subscription required?) can actually solve Problem C in WF25, as OpenAI claims.

→ Ответить

employed

8 месяцев назад, скрыть # ^ |

+36

No the public version is shit even at their proclaimed image processing I always find myself using the plus version because pro takes literally half an hour to spit nothing out, for example last div1A didn't get solved by it for more than 40 minutes

→ Ответить

amsen

8 месяцев назад, скрыть # |

← Rev. 2 →

+67

Ok got it AI can do CP. We got it since o3 got ~2700 last December.

Now get the f out of CP and let me know when you got the number of rs right in a public model.

→ Ответить

kr25161

8 месяцев назад, скрыть # ^ |

→ Ответить

strokeme

8 месяцев назад, скрыть # |

-24

remember when all the doubters were coping and saying its impossible or whatever? just wait until it replaces your job. you'll be even more surprised

→ Ответить

-firefly-

8 месяцев назад, скрыть # |

+45

OpenAI is good at narration. He says he uses an ensemble of GPT-5 and an experimental reasoning model, which I doubt if it ever exists. The declared result looks very suspicious, and I don't find a theory depicting how the model solves the problem. For example, if the model is given problems by a random order, then it takes more time to solve D and H than C, which is hilarious. If the model is asked to determine the difficulty itself, then the fact that it solves C before L is more hilarious.

→ Ответить

amsen

8 месяцев назад, скрыть # ^ |

Why not in parallel?

They cloned the whole thing for each problem and let it solve them in parallel.

This theory is compatible with the very close submission times.

→ Ответить

-firefly-

8 месяцев назад, скрыть # ^ |

← Rev. 2 →

It cannot explain why it takes 44 minutes to solve L.

→ Ответить

jkl

8 месяцев назад, скрыть # ^ |

I make no claims on the legitimacy of their results, but this is the repository of their solutions.

→ Ответить

chuka231

8 месяцев назад, скрыть # |

+22

If they are excited about CP, then bring back CodeJam.

→ Ответить

ton

8 месяцев назад, скрыть # |

Typical tales from Openai about the "experimental model". 0 confirmation, only statements. We readily believe that it is necessary to justify the investors' money somehow.

→ Ответить