Блог пользователя jonathanirvings

Автор jonathanirvings, история, 8 месяцев назад, По-английски

These two tweets, beginning with "I'm [really] excited to announce/share", were posted 5 minutes apart:

TL;DR Both Google Deepmind and OpenAI are claiming their models achieved gold-medal level performance at ICPC WF 2025.

  • Проголосовать: нравится
  • +146
  • Проголосовать: не нравится

»
8 месяцев назад, скрыть # |
 
Проголосовать: нравится 0 Проголосовать: не нравится

Auto comment: topic has been updated by jonathanirvings (previous revision, new revision, compare).

»
8 месяцев назад, скрыть # |
 
Проголосовать: нравится -14 Проголосовать: не нравится

claim: We'll have AI that beats tourist in 3 out of 10 div 1s by this day next year.

anyone willing to take me up on this bet? stakes: 100 usd (crypto)

»
8 месяцев назад, скрыть # |
 
Проголосовать: нравится +12 Проголосовать: не нравится

It seems the claim is supported by the ICPC foundation. But in any case, in the OpenAI link it states that "Demonstrating the power of AI under ICPC oversight, OpenAI's models successfully solved all 12 problems", so I wonder what the exact oversight entailed...

»
8 месяцев назад, скрыть # |
 
Проголосовать: нравится +25 Проголосовать: не нравится

Also note that OpenAI's model(s) solved all 12 problems, making this (if im not mistaken) the first major cp event where AI has outperformed all humans (present at the event) at cp (it fell short by 5 humans at this year's IOI and by 1 human at the atcoder heuristic world tour finals). This is probably due to ICPC problems having an AI-friendly "flavour", but it's nevertheless a landmark event.

»
8 месяцев назад, скрыть # |
 
Проголосовать: нравится +3 Проголосовать: не нравится

Interesting. Sounds like the Deepmind team was composed of Deepmind humans using Gemini, to solve 10 problems. OpenAI sounds like it was a completely automated system that solved 12 problems?

  • »
    »
    8 месяцев назад, скрыть # ^ |
     
    Проголосовать: нравится +4 Проголосовать: не нравится

    I would actually conclude an opposite thing based on the press-releases wording

    • »
      »
      »
      8 месяцев назад, скрыть # ^ |
       
      Проголосовать: нравится +23 Проголосовать: не нравится

      Also, phrase "OpenAI team was not limited by the more restrictive Championship environment" is pretty sus and the fact that GDM did basically mirror of WF while OAI was operating through "special AI test environment added to the Local Judge" sounds like OAI was trying to overshine GDM by all means including putting themselves into much better starting position resource- and limitation-wise

»
8 месяцев назад, скрыть # |
Rev. 2  
Проголосовать: нравится +4 Проголосовать: не нравится

Still wondering about the cost and when models with this level of capability will be generally available.

»
8 месяцев назад, скрыть # |
 
Проголосовать: нравится +70 Проголосовать: не нравится

IOI > ICPC

»
8 месяцев назад, скрыть # |
Rev. 4  
Проголосовать: нравится +12 Проголосовать: не нравится

Out of curiosity, is there submission limit at ICPC? At IOI you're limited to 50 submissions no matter what. At ICPC if there is no submission limit, you could make 1000000 submissions, completely send your time to the shadow realm but still win on solved problems.

Also, I don't believe ICPC scrambles tests, and you are told which test number has the first failure. So one could start extracting characteristics of a particular test (e.g. hash all of the input, get it out through WA/TLE/AC statuses and then hardcode in the code) and make sure that for that test the logic isn't modified further from AC and keep trying different things for next test.

So I'd be really curious in full results (including time) and also how affected would AI be if you add a couple of new tests.

  • »
    »
    8 месяцев назад, скрыть # ^ |
     
    Проголосовать: нравится +76 Проголосовать: не нравится

    You can see the submission count in the tweets: OpenAI passed on their first submit for 11/12 problems and used 9 submits on the last. Gemini did similarly, using 17 submissions for 10 problems.

    Also, ICPC does not show you test number of first failure; the information is only the status code of WA/TLE/AC (this differs from most other ICPC-style platforms, including ICPC-style CF rounds and UCup).

    I think it's pretty clear the agents were not cheating in this way. The question I do have is how many solutions were generated, and did the agents locally stress test?

    • »
      »
      »
      8 месяцев назад, скрыть # ^ |
       
      Проголосовать: нравится +3 Проголосовать: не нравится

      Thanks! You can’t view replies to tweets without an account so didn’t see it for OpenAI.

      My vague memory of 10 years ago recalled seeing judgements like WA42, but maybe my memory confused it with some other competition or things changed.

»
8 месяцев назад, скрыть # |
 
Проголосовать: нравится 0 Проголосовать: не нравится

So OpenAI 12/12 while GDM at 10/12. It's over. Sam is going to be the emperor of the universe.

»
8 месяцев назад, скрыть # |
 
Проголосовать: нравится +16 Проголосовать: не нравится

really curious whether GPT-5 (the public version, maybe GPT pro subscription required?) can actually solve Problem C in WF25, as OpenAI claims.

  • »
    »
    8 месяцев назад, скрыть # ^ |
     
    Проголосовать: нравится +36 Проголосовать: не нравится

    No the public version is shit even at their proclaimed image processing I always find myself using the plus version because pro takes literally half an hour to spit nothing out, for example last div1A didn't get solved by it for more than 40 minutes

»
8 месяцев назад, скрыть # |
Rev. 2  
Проголосовать: нравится +67 Проголосовать: не нравится

Ok got it AI can do CP. We got it since o3 got ~2700 last December.

Now get the f out of CP and let me know when you got the number of rs right in a public model.

»
8 месяцев назад, скрыть # |
 
Проголосовать: нравится -24 Проголосовать: не нравится

remember when all the doubters were coping and saying its impossible or whatever? just wait until it replaces your job. you'll be even more surprised

»
8 месяцев назад, скрыть # |
 
Проголосовать: нравится +45 Проголосовать: не нравится

OpenAI is good at narration. He says he uses an ensemble of GPT-5 and an experimental reasoning model, which I doubt if it ever exists. The declared result looks very suspicious, and I don't find a theory depicting how the model solves the problem. For example, if the model is given problems by a random order, then it takes more time to solve D and H than C, which is hilarious. If the model is asked to determine the difficulty itself, then the fact that it solves C before L is more hilarious.

»
8 месяцев назад, скрыть # |
 
Проголосовать: нравится +22 Проголосовать: не нравится

If they are excited about CP, then bring back CodeJam.

»
8 месяцев назад, скрыть # |
 
Проголосовать: нравится 0 Проголосовать: не нравится

Typical tales from Openai about the "experimental model". 0 confirmation, only statements. We readily believe that it is necessary to justify the investors' money somehow.