These two tweets, beginning with "I'm [really] excited to announce/share", were posted 5 minutes apart:
- https://x.com/HengTze/status/1968359525339246825
- https://x.com/MostafaRohani/status/1968360976379703569
TL;DR Both Google Deepmind and OpenAI are claiming their models achieved gold-medal level performance at ICPC WF 2025.









Auto comment: topic has been updated by jonathanirvings (previous revision, new revision, compare).
claim: We'll have AI that beats tourist in 3 out of 10 div 1s by this day next year.
anyone willing to take me up on this bet? stakes: 100 usd (crypto)
the new kasparov vs deep blue?
send me the link.
AI always wins. But as I'm just a broke kid so I can't risk it :(
It seems the claim is supported by the ICPC foundation. But in any case, in the OpenAI link it states that "Demonstrating the power of AI under ICPC oversight, OpenAI's models successfully solved all 12 problems", so I wonder what the exact oversight entailed...
Also note that OpenAI's model(s) solved all 12 problems, making this (if im not mistaken) the first major cp event where AI has outperformed all humans (present at the event) at cp (it fell short by 5 humans at this year's IOI and by 1 human at the atcoder heuristic world tour finals). This is probably due to ICPC problems having an AI-friendly "flavour", but it's nevertheless a landmark event.
Interesting. Sounds like the Deepmind team was composed of Deepmind humans using Gemini, to solve 10 problems. OpenAI sounds like it was a completely automated system that solved 12 problems?
I would actually conclude an opposite thing based on the press-releases wording
Also, phrase "OpenAI team was not limited by the more restrictive Championship environment" is pretty sus and the fact that GDM did basically mirror of WF while OAI was operating through "special AI test environment added to the Local Judge" sounds like OAI was trying to overshine GDM by all means including putting themselves into much better starting position resource- and limitation-wise
Still wondering about the cost and when models with this level of capability will be generally available.
Too expensive for public use
It'll always be optimized later
11 out of the 12 solutions are generated by gpt-5 which is publicly available....
IOI > ICPC
Out of curiosity, is there submission limit at ICPC? At IOI you're limited to 50 submissions no matter what. At ICPC if there is no submission limit, you could make 1000000 submissions, completely send your time to the shadow realm but still win on solved problems.
Also, I don't believe ICPC scrambles tests, and you are told which test number has the first failure. So one could start extracting characteristics of a particular test (e.g. hash all of the input, get it out through WA/TLE/AC statuses and then hardcode in the code) and make sure that for that test the logic isn't modified further from AC and keep trying different things for next test.
So I'd be really curious in full results (including time) and also how affected would AI be if you add a couple of new tests.
You can see the submission count in the tweets: OpenAI passed on their first submit for 11/12 problems and used 9 submits on the last. Gemini did similarly, using 17 submissions for 10 problems.
Also, ICPC does not show you test number of first failure; the information is only the status code of WA/TLE/AC (this differs from most other ICPC-style platforms, including ICPC-style CF rounds and UCup).
I think it's pretty clear the agents were not cheating in this way. The question I do have is how many solutions were generated, and did the agents locally stress test?
Thanks! You can’t view replies to tweets without an account so didn’t see it for OpenAI.
My vague memory of 10 years ago recalled seeing judgements like WA42, but maybe my memory confused it with some other competition or things changed.
So OpenAI 12/12 while GDM at 10/12. It's over. Sam is going to be the emperor of the universe.
really curious whether GPT-5 (the public version, maybe GPT pro subscription required?) can actually solve Problem C in WF25, as OpenAI claims.
No the public version is shit even at their proclaimed image processing I always find myself using the plus version because pro takes literally half an hour to spit nothing out, for example last div1A didn't get solved by it for more than 40 minutes
Ok got it AI can do CP. We got it since o3 got ~2700 last December.
Now get the f out of CP and let me know when you got the number of
rs right in a public model.W
remember when all the doubters were coping and saying its impossible or whatever? just wait until it replaces your job. you'll be even more surprised
OpenAI is good at narration. He says he uses an ensemble of GPT-5 and an experimental reasoning model, which I doubt if it ever exists. The declared result looks very suspicious, and I don't find a theory depicting how the model solves the problem. For example, if the model is given problems by a random order, then it takes more time to solve D and H than C, which is hilarious. If the model is asked to determine the difficulty itself, then the fact that it solves C before L is more hilarious.
Why not in parallel?
They cloned the whole thing for each problem and let it solve them in parallel.
This theory is compatible with the very close submission times.
It cannot explain why it takes 44 minutes to solve L.
I make no claims on the legitimacy of their results, but this is the repository of their solutions.
If they are excited about CP, then bring back CodeJam.
Typical tales from Openai about the "experimental model". 0 confirmation, only statements. We readily believe that it is necessary to justify the investors' money somehow.