Latest chat-GPT only have a codeforces rating of 392

22 months ago, # |

← Rev. 2 →

+21

Previous state of the art was (in my understanding) alphacode, which got way better results than rating of 392. The first model to reach ratings of 2000 or 3000 or 4000, will probably* be a model which is trained specifically to be good at competitive programming and not a general language model

*Just my guess

→ Reply

22 months ago, # ^ |

← Rev. 2 →

+28

fair. but why would anyone waste their time developing a model specifically for CP, what is the end gain?

and to be honest, I think an AI model reaching 2000 is an overreach.

→ Reply

22 months ago, # ^ |

← Rev. 2 →

They already made a bot for CP once, and it did better than anyone (at least anyone I know) expected. I don't see any reason why people wouldn't still be interested in creating a CP bot.

And for the AI reaching 2000, I expect there to be AIs with 2000 rating in less than five years. I'm not sure if five years ago we had an AI that could write any not copy-pasted code, but my guess is that we didn't. Last year Alphacode got to 54th percentile and the capability improvements of other areas with large language models seem to still improve.

At what rating do you think we'll be in 5 years?

→ Reply

22 months ago, # ^ |

They already made a bot for CP once, and it did better than anyone (at least anyone I know) expected.

what bot?

→ Reply

22 months ago, # ^ |

Alphacode:

Quick blogpost: https://www.deepmind.com/blog/competitive-programming-with-alphacode

The paper: https://arxiv.org/abs/2203.07814

→ Reply

22 months ago, # ^ |

I'm not sure if five years ago we had an AI that could write any not copy-pasted code

I'm not sure there is any even now. I'm not expert on topic but from my understanding all these models are using "copy-pasted" code that way or another.

→ Reply

22 months ago, # ^ |

I don't see a strong case to be made that the code they are producing is copy pasted. For example GitHub copilot can take into account the context of the file when suggesting code which makes it hard for it to just copy paste.

→ Reply

22 months ago, # ^ |

Then you have no clue how GitHub Copilot is working at all. It's proven fact that it was trained based on GitHub repos (and not always public ones) and it copy-paste chunks of code from them quite frequently.

I haven't heard about such issues with ChatGPT and AlphaCode, but I'm sure they are also trained using some codebase and don't write code from scratch.

→ Reply

22 months ago, # ^ |

I misspoke a bit. What I was trying to say is that they can produce "novel" code that is not copy-pasted.

→ Reply

22 months ago, # ^ |

I doubt that. It definitely could adapt code for particular context (change variable names etc), provide valuable suggestions or mix chunks of code it got during training (maybe that's why it's not that obvious that it's "copy-pasted"), but it's very unlikely that it could really produce something original.

→ Reply

22 months ago, # ^ |

What would be a kind of task / code it could solve / produce that would make you think that it isn't just copy-pasting?

→ Reply

22 months ago, # ^ |

Something which cannot be derived in a trivial way using training codebase. I understand that it's pretty hard to check without actual codebase :) So to provide you with more concrete metric: let's say that it should be able to solve non-trivial CP problem which solution is not available publicly on internet or at least you cannot find it easily using data provided to AI tool you're gonna use. Only that will give us at least some sort of guarantee that it wasn't direct or modified copy-paste of something existing before.

→ Reply

22 months ago, # ^ |

← Rev. 3 →

-13

What do you mean by copy-pasting? A reasonable amount of codes in GitHub are not well-designed and anti-pattern that the AI cannot rely on and should be considered "noise".

The codes are end 2 end AI-generated.

The AI model is way more sophisticated than you think. It produces novel codes, whether you want to believe or not.

→ Reply

22 months ago, # ^ |

What do you mean by copy-pasting?

Reusing the code from a codebase directly or after modification/adapting for the context. In other words, writing the code without inventing it from scratch.

The codes are end 2 end AI-generated.

And still copy-pasted.

It produces novel codes, whether you want to believe or not

I believe in what I see. And I provided previous commenter with an metric to prove that AI can do something novel. Use that if you want to convince me. Otherwise, I don't see any reason to continue debates.

→ Reply

22 months ago, # ^ |

I believe our disagreement stems from the choice of words, particularly the term "copy-paste." Setting aside AI models for a moment, let's consider writing code in C++. When programmers use "int" or name a variable "tmp," are they copying from one another? Or are they simply utilizing existing knowledge? When someone develops a solution, it can't be considered wholly original, as it is a combination of pre-existing knowledge. At the same time, it's not accurate to label it as mere copying and pasting of existing code.

This is a response from Chat GPT:

As an AI language model, I do not directly copy and paste from existing code or text. Instead, I generate responses based on the patterns and knowledge I have learned during my training. My training data includes a vast amount of text from various sources, such as websites, books, and code repositories. This enables me to provide relevant and helpful information when answering questions or discussing topics.

I respect your point of view, and even though we might think differently, it's cool to exchange ideas.

→ Reply

22 months ago, # ^ |

When programmers use "int" or name a variable "tmp," are they copying from one another? Or are they simply utilizing existing knowledge?

First case: they are using programming language basics, which everybody is familiar with. Second case: utilizing existing conventions in variable naming. Any of those aren't copy-paste because they don't require any idea and just are building blocks in the programs. But I understand your point and agree that there is quite thin line what should be labeled as copy-paste and what — shouldn't. In the current context I think it does make sense to stuck with something high-level, like solution to a problem in whole. So if AI can invent solution by itself — it definitely can "think", even if some pieces of code would be "part of existing knowledge". Otherwise, it just mimics what it saw in the training dataset and it's nothing else than sophisticated copy-paste. I hope that should be clear enough to polarize this discussion.

→ Reply

vedant-z

22 months ago, # ^ |

@gultai4ukr: I'm not expert on topic

But you sure are an expert on CF.

→ Reply

stakudog

22 months ago, # ^ |

how is it a waste of time? A computer that can solve different kinds of problems on its own and with enough work, it might as well surpass humans in this domain too. Now imagine this becomes the norm. Every programmer would have a small pocket 'tourist' capable of optimizing their code near the max possible extent. It would revolutionize all software.

→ Reply

22 months ago, # ^ |

← Rev. 3 →

how often do you think programmers use any algorithms at their jobs? or write code that needs optimization from O(N^inf) to O(1)?

→ Reply

22 months ago, # ^ |

Just because you don't find it useful for your project doesn't mean others won't benefit from it. Apologies if my tone seems off, but I've heard this kind of comment way too often. You could have said the same thing to the folks who developed the gradient descent technique, and look where we are now – it's at the core of deep learning. So, let's keep an open mind, shall we?

→ Reply

22 months ago, # ^ |

+10

Oh, no. The ML enthusiast has joined the chat.

→ Reply

Chimpanzee

22 months ago, # |

← Rev. 2 →

I believe it's a matter of time. What ai is capable of doing currently is horrifying

→ Reply

brunovsky

22 months ago, # |

+381

What this shows, by the way, is that competitive programming (and similar activities like math olympiads) teaches young studs something beyond what is effectively tested by regular math/programming exams and software engineering interviews: a deeper level of thinking that is difficult to put to words and much more rare — leading to learning content that is also much sparser on the internet. Even with the expectation of considerable improvement in the future, this should be reassuring to anyone practicing this hobby regularly and a great incentive for the next generation to join in.

Or, you know, maybe the AI just gets confused why Alice receives so many arrays on her birthdays....

→ Reply

22 months ago, # ^ |

+52

Or, you know, maybe the AI just gets confused why Alice receives so many arrays on her birthdays....

Apparently, because she still hasn't sent boobs to Bob. But it will be our human secret ;)

→ Reply

huikang

22 months ago, # |

I wonder if the 3/45 Leetcode Hard attempts is one-shot or does it allow back-and-forth with the sample test cases like this.

→ Reply

22 months ago, # ^ |

My bet is that was without any guidance

→ Reply

lethan3

22 months ago, # |

+49

I wonder why it does better in AMC 12 than AMC 10?

→ Reply

rog

22 months ago, # |

Although chat-gpt is not very good at solving algorithm competition problems, its translation function is very powerful. I often use it to translate English into Chinese. At the same time, it can also organize formats, such as automatically generating LaTeX format. I think it still has a great influence on me.

→ Reply

18o3

22 months ago, # |

How did they find the rating of the bot? Did the researcher have ChatGpt participate in some virtuals? Also is there an ID for ChatGPT where we can see it's code.

→ Reply

TwentyOneHundredOrBust

22 months ago, # |

+30

AlphaGo did not destroy go in the same way stockfish didn't destroy chess... I love and enjoy it and it's even better when there's a superhuman engine on my phone to tell me where I went wrong. It's great to be proud that competitive programming remains unconquered, but it's not the end of the world if it is.

→ Reply

Geothermal

22 months ago, # ^ |

← Rev. 2 →

+188

Maybe I’m wrong, but it’s hard to imagine a situation where a public AI model becomes stronger than the best competitive programmers but online programming contests don’t die out. The relevant differences between chess and Codeforces are that in chess, my understanding (as someone who doesn’t play chess…) is that (a) online chess platforms are pretty good at detecting blatant cheating, so only those who cheat in relatively subtle ways can go undetected, and (b) OTB tournaments are frequent enough to serve as the main form of serious competition, meaning that (1) players who consistently perform much better online than OTB may attract suspicion and (2) cheating online is not as problematic because OTB rating is considered the primary metric of skill.

Neither of these mitigating factors apply to CF. Even if we assume for the sake of argument that the hypothetical AI writes code that is watermarked in a way that can’t be removed through minor modifications, it seems reasonable to assume given the capability of ChatGPT to provide mostly reasonable explanations for the code it generates that an AI capable of solving any competitive programming problem will be able to explain the solution idea in words comparably well to an editorial (or, even if this is not the case at launch, it seems nearly certain that updated versions of the model will eventually develop this capability). This means that even in the best case, access to AI would give cheaters an advantage comparable to access to an editorial, which is clearly a game-breaking edge.

One might argue that cheating is a risk already given that it’s possible to discuss problems with others without being caught. However, this seems like a relatively minor concern at the Div. 1 level, since there are few people strong enough that it would be significantly advantageous to cheat off of them and those people have no incentive to share their code with others, meaning the only plausible form of cheating is people around the same level sharing solutions, which requires one of them to actually solve the problem first. Cheating using AI would be (a) much easier, since you don’t need to find other people to cheat with, and (b), much more significant, since immediate access to what is effectively an editorial would in my view be a far greater advantage than access to a solution only after someone else of a comparable skill level solves the problem. Perhaps I’m pessimistic, but I find it difficult to trust that if a quick and easy way to cheat for a 500+ rating point advantage existed, there wouldn’t be a sizable number of people (even among competitors who are skilled enough to implement solutions to 3000+ problems) who would abuse it. As such, AI strikes me as a practically existential threat to online competitive programming (if you disagree, please convince me I’m wrong—I’d love to be incorrect about this, since if I’m right it seems like we could have only a few years of Codeforces left…).

Moreover, if online contests lose their competitive integrity, competitive programming will lose much of its appeal as an activity, as in-person contests are not nearly frequent enough to serve as the primary metric of skill. I certainly would not have continued competing past high school if ICPC was the only “real” contest that existed; in fact, my primary goal over the last four years has been to improve my performance on Codeforces, with on-site contests as a secondary focus. All this is to say that when considering purely its effects on competitive programming, I’m not sure the rise of AI models stronger than any human competitor is something to celebrate.

→ Reply

dpaleka

22 months ago, # ^ |

+19

AI strikes me as a practically existential threat to online competitive programming

I am never going to say "Technically correct, the best kind of correct" again.

→ Reply

__fn__

22 months ago, # ^ |

You are right. Indeed currently wanting AI to be able to solve complex problems (I have nothing against it) would be profitable if people would make good use of it. Unfortunately there will always be those people who want to cheat for online competitions in order to increase their rankings or to pass certain qualifications. On the other hand, it could be useful if you want to understand something very quickly and you don't have the time and above all there are no people available to help you. It is a matter of using AI well. We'll see what will happen next...

→ Reply

zed_b

22 months ago, # ^ |

+107

I am pretty sure if AI becomes better than top competitive programmers CF dying out will be our last concern :)

→ Reply

22 months ago, # ^ |

← Rev. 5 →

+75

I believe the MO/adhoc meta will save competitive programming from AI — I find it very hard to believe that AI (for at least the next decade or so) will be able to get a non-trivial score at the IMO (on completely unseen problems, i.e., without any test set leakage) even with rules naively stuck on top of natural language models as they are now. Non-trivial score as in not counting trivial problems like the ones AI has already solved: for instance, here.

Even if routine implementation problems and "optimize using a data structure" problems end up being solvable by AI, if you give it problems with non-trivial observations, I am confident that it's not possible for it to join the pieces together in a logical manner, at least at the current stages of AI development.

Sure, it can guide competitors towards some ideas that can be helpful — I'm not denying that. In my opinion, the usage of AI can only increase the lower bound by a benign amount (immeasurably small, for now). I would definitely like to see an AI that performs at the expert/candidate master level — that is when I think things would become interesting. The reason games like chess and Go are so thoroughly beaten by AI is in part the limited "branching factor" of their search space, which in the case of competitive programming or MO problem solving is too high (so high that it's theoretically undecidable) to be feasibly exploited systematically. Not saying that problem-solving is something that only humans are capable of, but instead claiming that understanding how problem-solving works is still an unsolved problem.

Until then, considering how the current state of the art in AI's performance in competitive programming is limited to either copy-pasting code from other sources (Copilot) or having a perfect score on training data and zero performance on test data (ChatGPT-4), I think there's still a long way to go before we find

→ Reply

dpaleka

22 months ago, # ^ |

← Rev. 2 →

Do you want to bet that a computer will not be able to solve any recent IMO problem (given the statement) before 2029? Any reasonable amount, or just on reputation?

It's not even about AI; the IMO is a bad benchmark except if you impose some no-calculator restriction. It's pretty likely that what you say can't be done in a decade can be done right now.

→ Reply

22 months ago, # ^ |

← Rev. 2 →

+16

In a much earlier comment on CF, I've been of the view that it's not impossible to come up with something that automatically solves geo problems, and this opinion is not inconsistent with what I said above — because IMO geometry is "trivial" in the sense of being decidable (and once you know enough theorems it's quite mechanical and doesn't even need diagrams — speaking from experience).

What I said here was something non-trivial in the current AI context, like constructive combinatorics problems and so on, and not something mechanical.

→ Reply

ko_osaga

22 months ago, # ^ |

← Rev. 2 →

+16

Well, I think adhoc/constructive problems will be the first one to be conquered among CP problems if it happens. Think about how you approach such problems. A lot of cases, you try to solve many small cases, find pattern, and generalize it. This is exactly where machines outperform humans — they have a good sense on what "pattern" is (it is a mathematical model) and they have much better computing power to prove the hypothesis in small cases. I can imagine a machine struggling "find range inversion count query faster than n^2" but cheesing out some AGC D/Es.

Actually, the way of solving adhoc problems are also pretty machinery. Sometimes, you just take several patterns as a hypothesis, and brute force to see what is right and whatnot. Machines would do much faster.

On the other hand, I'm not sure if this is something that benefits greatly from the GPT revolution. Maybe this is more of a PL/formal verification stuff.

→ Reply

22 months ago, # ^ |

+24

Indeed, when I said it is very hard for a language model, I was referring to how GPT stuff doesn't seem like it will help much.

In my previous comment I had linked the recent (not so recent by AI standards, though) paper by Meta that developed a theorem prover which could solve a few IMO problems. Their approach seems to be different from "looking at examples and trying to prove" and follows more of a mechanical approach by fitting concepts together like Lego blocks (essentially a search on a graph), and unless such "exploratory" analysis of problems can be automated, it feels like current approaches are missing out on something. AI theory is aware of this in some sense — they call it the exploration-exploitation tradeoff. This is why I think hard adhoc and constructive problems (those at the IMO 2/5 or AGC D/E level at the very least) are still a long way from being solved by AI. I won't call it learning unless it can come up with novel approaches like considering some seemingly unrelated idea that magically solves the problem and explain why the motivation behind the solution too. Until then, it's just matrix multiplication to me.

However, there are non-AI tools that automate such "guess and prove" ideas to some extent.

For example, the method of automatically finding properties of submitted triangle centers by the Encyclopedia of Triangle Centers is a method that does something similar. It reduces the search space for properties by testing out hypothesis on certain triangles, and those which seem to resist attempts at disproving are further tested via coordinate systems.
Similarly, there are inequality-proving software that were used by a lot of Chinese and Vietnamese inequality enthusiasts a few years ago (I am not sure what happened to them).

There also has been research on "learning" tasks at human speed by DeepMind (I recommend reading the paper, it's very cool), but it still does seem very far from being able to solve IMO problems.

→ Reply

ko_osaga

22 months ago, # ^ |

+11

Yeah, my claim is that the relative difficulty of adhoc problems should be much easier for machines rather than humans. This is essentially adhoc constructive problems solved by machines.

And I don't know why machines should provide motivation for their solution, do humans are able to provide them..?

→ Reply

22 months ago, # ^ |

← Rev. 2 →

I remember reading this paper when it came out, and they claimed that the algorithm is better in terms of the number of multiplications for a certain small step, and it reduces the constant factor in Strassen's algorithm accordingly. However, it doesn't make the time complexity better, or reason about the optimization itself — the tensor decomposition task was well-defined and the RL application here was just to find a set of steps that work correctly. In theory, this could have been done by other methods such as simulated annealing on some kind of sets of proof trees, but this paper was remarkable because they considered the decomposition task like a game, and then used game AI base knowledge to solve it.

Given enough computation resources, it can indeed help find more optimal solutions to a problem, just because it can do brute-force faster. However, this paper is not an example of solving "generic" problems. At the current stage of AI, we can definitely use machines for specialized tasks by applying human creativity to arrive at some general setups that solve certain kinds of problems (the Meta AI paper is also similar).

However, for being able to reliably solve an Anton problem, I believe AI in its current stage needs more work. One can argue that we can use the same kind of RL as used in that paper in the theorem-proving domain, but the difficulty for RL approaches increases rapidly with "allowed moves" and the state space size, both of which are very vast if you consider all adhoc problems.

Most of this discussion is hypothetical, though, and maybe I will turn out to be wrong (I would be glad, though, if I'm wrong and some cool AI comes up that can solve non-trivial adhoc problems — it'll definitely be an engineering marvel).

Regarding motivation for their solution: I think it's just a matter of preference. I would definitely want a correct proof for a problem, and it's an added bonus if the path to that solution is interpretable. This is more on the meta side of things, and the model is not really required to be able to reason about itself, but I feel that generalization (and not overfitting to examples) has to come with some inherent stuff. For example, a lot of RL approaches are interpretable, because we can look at the state space graph and the transition probabilities (and the values relevant to the Bellman DP equations at every step) and get an idea of why the exact steps that the AI took were chosen.

→ Reply

ko_osaga

22 months ago, # ^ |

Yeah, I know that the paper didn't get even close to SOTA and stuffs, I think I can just say that ad hoc problems "tends to" (not all!) require small moves and space size, since they have short solution, and pattern matching are very helpful. Right?

→ Reply

22 months ago, # ^ |

I agree with that, heuristics can definitely reduce the search space a lot.

However, for a person who's new to problem-solving, there would be a ton of failed attempts, and the final solution will still be very short, and that too after a lot of cleaning up. For instance, in the proofs shown in the Meta AI paper, they noted that a lot of used tactics were redundant, and though the found proof was long but correct, the cleanup ended up with a solution that was very short. In this sense, slick proofs might not be very "searchable" and need more insight and wishful thinking to be able to reach.

I think one more fundamental issue with such things is that there are very few rigorous proofs for adhoc problems. This makes the task especially difficult because learning patterns is hard if you don't have some guidance. There's an obvious example that shows that giving appropriate guidance won't be so easy either. For example, greedy solutions — the solution description is simple but in editorials (which can potentially be used as training set), we rarely find proofs of correctness that can be analyzed by some NLP and restructured into a rigorous proof that the AI can be given as a learning input. Humans reason very approximately, then rigorize their solution, and if it doesn't work rigorously, they try to amend it. One way of making AI learn with just human inputs is to try to account for this process, but it's unnecessarily complex.

I've thought about trying to come up with such an automated problem-solving setup multiple times, but the effort seems to be very disproportionate compared to the perceived results. I think the current training set (of editorials and so on) would be more usable if there was a tool that could convert proofs into formal proofs (in Lean or something) and then Lean checks if they're correct. Then the AI (theorem prover) can use those Lean proofs to learn, and the resulting Lean proof can be summarized by another tool into a human readable format.

→ Reply

fried_brain

22 months ago, # ^ |

Surprised that no one has mentioned it yet(perhaps it's not so popular).

There's an initiative going on for awhile called the IMO Grand Challenge, where a bunch of smart people(including Reid Barton, one of the top CPers who used to compete in the TCO days) have come together to develop an AI whose ultimate goal is to compete side by side with humans and win a gold medal.

If this turns out to be successful, most likely CP would be no more, atleast not as we know it today. :(

My personal view is that we're a long way from this turning to reality(the IMO challenge has been going for 4 years-ish now). I hope it doesn't, I want to enjoy CP with my fellow competitors.

→ Reply

https://mirror.codeforces.com/contest/1791/submission/197627585

22 months ago, # ^ |

This challenge is probably the reason why IMO gets a mention in the linked Meta AI paper, which is why I didn't link to it. One of the USPs of that paper was that it "solved 10 IMO problems, 5x more than others."

→ Reply

TwentyOneHundredOrBust

22 months ago, # ^ |

Well... maybe every contest becomes implementationforces then

→ Reply

DottedCalculator

22 months ago, # ^ |

AI can solve all geometry problems by coordinate bashing.

→ Reply

randomalt194738294

22 months ago, # ^ |

-28

I agree that using AI to write editorials for coding contests could lead to cheating, but I want to emphasize that most people who use Codeforces do so either to participate in Olympiads or to solve challenging problems for enjoyment. Cheating with AI would not be helpful for either of these groups.

Although ratings motivate many people, I believe that a lot of Codeforces users will continue to enjoy competitive programming even if ratings become irrelevant. Therefore, I do not think that Codeforces will disappear if ratings are removed, but users who focus on ratings will have to find new ways to stay motivated.

Additionally, as AI becomes more widespread, I am hopeful that new methods will be developed to prevent cheating and ensure that coding contests remain fair and competitive.

→ Reply

thelastdinner

22 months ago, # |

+48

I believe that even if you can't solve any problem in the contest, you can get the rating of this.

→ Reply

EarthMessenger

22 months ago, # |

Only solve one problem in Div. 3 round? I think gpt4's performance is better when it solves leetcode problem.

→ Reply

cf_chat_gpt

22 months ago, # |

+21

I tested ChatGPT 3.5 and got better ranking during 3 contests (737), check my profile. I think those results are pessimistic. I'll keep experimenting with GPT-4 in the next contests...

→ Reply

b4nanacream

22 months ago, # ^ |

you need to participate in at least 6 contests to know your real rating.

→ Reply

Benq

22 months ago, # ^ |

+47

I saw that you only managed to do the first two problems from the last div. 4 round. When I tried this (also with ChatGPT 3.5), it managed to solve D as well on the first try (maybe because I specifically requested a solution in Python?). It also managed to solve G1 when I told it to use sorting, and (almost) solve E when I told it to use DP.

→ Reply

cf_chat_gpt

22 months ago, # ^ |

← Rev. 3 →

+10

That's interesting. I retried now on Problem D, but didn't manage to get the right solution in Python, did you copy also the example testcases? It seems the GPT output is highly dependant on the input string & whether we give hints or not, but I didn't give any hint to GPT..

It could also be that I was mixing the use of GPT playground (which I believe based on GPT 3) when the GPT 3.5 becomes inaccessible due to load issue.

→ Reply

Benq

22 months ago, # ^ |

+34

I asked

Can you solve the following problem using Python?

and copied the entire problem statement, including the examples.

→ Reply

cf_chat_gpt

22 months ago, # ^ |

Update: I got now as well the right solution for Problem D, after explicitly asking GPT to optimize the code for speed.

→ Reply

ivan100sic

22 months ago, # |

+54

I don't think LLMs will ever beat humans because they learn from human-written content, which is extremely scarce and difficult for AI to understand at this point for all advanced CP stuff. However another form of AI almost certainly will beat us at some point. But by then AI will have taken over almost all informational aspects of life.

→ Reply

CutSandstone

22 months ago, # |

I can sense "Therefore, I am proud to say, competitive programming, like us codeforces, is the hard obstacle chat-GPT will overcome" aging badly.

→ Reply

dXqwq

22 months ago, # |

Imagine that using chatGPT to solve 2A/2B and save a minute

→ Reply

22 months ago, # ^ |

+28

It's faster to solve them on your own anyway

→ Reply

22 months ago, # ^ |

+11

One minute is not worth a risk to get 50 points penalty :)

→ Reply

oToToT

22 months ago, # |

How about using GPT to generate CP problems (with solution)

→ Reply

22 months ago, # ^ |

Interesting idea, but obvious downside is that everybody will be able to solve it using GPT

→ Reply

NemanjaSo2005

22 months ago, # ^ |

← Rev. 3 →

I actually tried that with 3.5 (ChatGPT). It was unable to generate any original problem and mostly just standard problems. Therefore it couldn't generate good problems. Which makes sense because it was trained to predict what words come next in already existing texts.

While it surely could come up with a completely new problem, it is quite unlikely as most of chat GPTs responses seem kind of standard.

Edit 2: Just did 10 requests to generate a problem. Here is the outcome:

Given an array, find the longest subarray with equal sum and product.
Given a set of elements, find the number of pairs of 2 elements who sum up to K.
Find maximal sum of an subarray.
Find maximal sum of a subarray with length K.
You are given an array. Find the number of triples of elements whose sum is divisible by 3.
LCS problem.
You are given an array and number K. Out of all subarrays who have sum divisible by K, find the one with largest sum.
Given an array. Find the minimal amount of swaps of adjecent elements to sort it .
LIS problem.
This

Honestly, even though most problems are standard, a few of them are interesting. (P10 even appeared in CF round recently.)

→ Reply

dpaleka

22 months ago, # ^ |

← Rev. 2 →

I wrote an essay on why text prediction is not just memorization; practically everyone on ML Twitter agrees with the sentiment.

→ Reply

NemanjaSo2005

22 months ago, # ^ |

← Rev. 2 →

Let me clarify. I never said that ChatGPT is just memorizing. Instead, it is observing patterns in language, like relationships between tokens and such. With that, it is able to generate text. However, it doesn't have the knowledge of the meaning behind the words.

I said that it cannot generate original problems based on my experience with it. It definitely could generate a totally new problem, but most of the time it will just come up with something well known or NP hard. And that is kind of what you would expect from a text predictor.

I will make an edit to the previous comment to clear it up.

→ Reply