hoke_t's blog

By hoke_t, history, 23 months ago, In English

I am these days a very irregular Codeforces visitor, but yesterday saw the post About Problem Coincidences by MikeMirzayanov and was wondering about a solution along the lines of a problem database like what was suggested some time ago by Um_nik. Lots of the comments there mention the idea of constructing a notion of problem similarity using solutions, which seems reasonable, but others countered that they don't want to have to write a whole solution just to tell if a similar problem exists. Given the current state of AI, could something like the following work now or in the near future?

  1. For every problem statement in the CF, AtCoder, etc. universe, use AlphaCode or the editorial to get a solution.
  2. Embed each of these solutions into a vector space. AFAIK this is straightforward to do.

Then when someone wants to see the most similar problems to their proposed statement, they either write a solution, or just write a statement and behind the scenes the database can have AlphaCode (or similar) generate a solution, embed using same method as in 2. and query for nearest vectors in the space. AlphaCode can't solve all problems, but maybe it at least writes something that "approximates" a correct solution well enough that the resulting vector is close to the "correct" one?

Of course, AlphaCode itself is not available to the public, but probably it or something similar will be soon. Thoughts?

  • Vote: I like it
  • +33
  • Vote: I do not like it

| Write comment?
»
23 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by hoke_t (previous revision, new revision, compare).

»
23 months ago, # |
Rev. 3   Vote: I like it -93 Vote: I do not like it

yOU nEeD tO COnsIder SImpLe ConVErsIoN, SUCH as data StruCTURE ProblEMS. The chaIN oPerAtion ON tHE TREE caN bE ConverteD iNTo The iNTErvAL OPeRATIon of DFn, wHiCH iS MORE LikE "neaREST VeCtoRs" tHan OtHErs. ObVIOusLY, ai CANNoT dO ThIS.


Thank you to everyone who voted against it. <3

»
23 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Good ideas but I see one difficulty : I believe it's hard to have a good unique embedding, especially if output or input are not exactly the same (for instance if in one exercice the list is no sorted and it is in the other).
Furthermore, if only half of one problem (so a subproblem) correspond you won't know it. That would be burdensome but a more standardized version of the problem could be down, for instance never gives list in sorted order, and always gives trees either as list of edges, or array of parent with tree rooted at one (I don't understand why way isn't the standard, after writing the rooting operation too often I added it to my template).
By the way, why is input and output still 1-indexed while most codeforcers use 0-index languages (I mean C + Java + Python) ?

»
22 months ago, # |
  Vote: I like it +13 Vote: I do not like it

At that point, it might make more sense to run a text-analysis algorithm to extract core components of the problem (so like whatever alphacode uses when it tries to solve a problem, without actually coding up a solution) and embed that into a vector space. Should take much less time and be more efficient.

  • »
    »
    22 months ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    Some friends and I tried doing this for a hackathon this weekend lmao, didn't get to finish tho, might try to improve over upcoming weeks.

»
22 months ago, # |
  Vote: I like it -19 Vote: I do not like it

What do u mean by putting them in a vector space? How do you define scalar multiplication in a vector space of solutions??

  • »
    »
    22 months ago, # ^ |
      Vote: I like it +11 Vote: I do not like it

    (everything below might be not 100% true, i am merely a beginner in ML world)

    "putting them in a vector space" is a common technique in machine learning.

    Words that we, humans, use are not that great for computers, you can't know if a word is close to some other word or not just by the structure of the word. How do you derive that good is an opposite of bad, or how do you define that white is kind of close to the gray? Yeah, you kind of can't.

    So, let's try to think about the ways to represent a word in a way that makes it much easier to know if they're similar(so, let's try to represent semantics). Well, a natural thought(at least for me) might occur: what if we pick some really general words and try to represent all words by how much they are similar to each general word, so that the closest ones are also the most similar ones? So, if, for example, one of the general words is "good", then you can rate word "good" as the same word, so in numbers, let's say 1.0, and "bad" is absolutely not "good", so it will be -1.0. And after doing this for all general words, we get an array A, where A[i] = closeness of the word to i-th general word.

    And you might see that A is just a vector of (number of general words) dimensions, and each dimension is some general word. In a case, where possible set of words is too big, we don't choose the set of general words, we let neural network learn some set that lets it encode and decode words. But ords is just one of possible jobs for such thing, you can make it work with solutions also, which is what was meant by "putting them in a vector space".

»
22 months ago, # |
  Vote: I like it +41 Vote: I do not like it

just ask the average chinese cper lol

»
22 months ago, # |
  Vote: I like it +3 Vote: I do not like it

I think we should get over it now and accept that these type of coincidences are inevitable.

  • »
    »
    22 months ago, # ^ |
      Vote: I like it +49 Vote: I do not like it

    "I think we should get over it now and accept that you can't cure smallpox" — people said instead of inventing vaccines.

    "I think we should get over it now and accept that humans can't fly" — Montgolfier brothers said instead of inventing hot air balloons

    "I think we should get over it now and accept that computers can't beat humans" — IBM said instead of upgrading Deep Blue who would capable of winning against Garry Kasparov.

    "I think we should get over it now and realize that i will not find a love" — some man said instead of meeting a woman which would lead to birth of a human (maybe this human would've been me, or maybe, you).

    I think we should get it that we should not get over something if it is a problem to us.