Finally, semantic search for competitive programming problems

→ Pay attention

Contest is running
Codeforces Round 995 (Div. 3)
02:12:32
Register now »

→ Streams

Codeforces Round 995 Solution Discussion

By aryanc403

Before stream 02:17:30

View all →

→ Top rated

#	User	Rating
1	tourist	3985
2	jiangly	3814
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3517
7	Radewoosh	3410
8	hos.lyric	3399
9	ecnerwala	3392
9	Um_nik	3392

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	cry	169
2	maomao90	162
2	Um_nik	162
4	atcoder_official	161
5	djm03178	158
6	-is-this-fft-	157
7	adamant	155
8	awoo	154
8	Dominater069	154
10	luogu_official	150

View all →

→ Find user

→ Recent actions

Detailed →

TLE's blog

Finally, semantic search for competitive programming problems

By TLE, history, 13 months ago, In English

Hello Codeforces,

It has been a long while, but in this project we close the long-standing open problem proposed by Umnik 2021. You can try it here (discontinued, see the new link below) while supplies last. Currently I only imported problems from Codeforces & BZOJ (the dead Chinese OJ) but adding other OJs should be easy as long as we have the statements crawled (PRs?). Cheers!

Update (8 months later): We finally got an update! In the new version I collected and uploaded most of vjudge (which means 160k problems!). It also got a shiny new domain http://yuantiji.ac. Enjoy! :D

+1200

TLE
13 months ago
28

Comments (26)

Show archived | Write comment?

Mo_Huzaifa

13 months ago, # |

Nice Initiative

→ Reply

gin_spirit

13 months ago, # |

+13

I don't understanding anything. But it looks like something useful so I upvote for you.

→ Reply

ARMINIUS

13 months ago, # |

Nice job!

→ Reply

oToToT

13 months ago, # |

← Rev. 2 →

out of curiosity: how bad would the search results be if we don't use chatgpt to simplify the problem.

→ Reply

kpw29

13 months ago, # |

-29

Truly amazing work. You should write a paper about it or sth.

I'm slightly worried about consequences for competitive programming... you should probably block usage during contests as an anti-cheating measure. Otherwise you'll lose your credits pretty quickly :)

→ Reply

GusterGoose27

13 months ago, # |

+133

In regards to cheating concerns, this may actually reduce cheating incidents by making it easier for authors to find repeated problems.

→ Reply

TwentyOneHundredOrBust

13 months ago, # |

+37

this is neat, but then won't the training data annotators know your next problem when you plug it into openai?

→ Reply

TLE

13 months ago, # ^ |

← Rev. 2 →

+41

I'm using their paid API (same function as chatgpt but not free..), so in theory they should not be used for training :|

→ Reply

TwentyOneHundredOrBust

13 months ago, # ^ |

← Rev. 2 →

I guess even then someone working at openai who really really wants to cheat on a contest could do it, but that's probably not going to occur. How well does it work? I tried plugging in this year's FHC 3B which seems very similar to 1870E but the ones it gives don't seem very closely related to it.

→ Reply

Lyrically

13 months ago, # ^ |

← Rev. 2 →

However, I plugged CF1793F into it but it showed nothing that is even close to CF765F. What can possibly be the issue here? ig it's because of the problem background, but how to deal with that?

Update: also, with this year's CSP-S problem 2, i plugged it in but still, it isnt showing CF1223F. Even after it's paraphrased to "Given an array of integers, we want to count the number of non-empty continuous subarrays that can be reduced to an empty array by repeatedly removing adjacent identical elements.", CF1223F shows nowhere on that list.

→ Reply

TLE

13 months ago, # ^ |

+21

Yeah, the system is still imperfect — we should probably experiment a better prompt to remove the backgrounds (you can find the current prompts here).

For your second example, it seems doable with a bit of luck...

→ Reply

Lyrically

13 months ago, # ^ |

Yeah, automatically removing the background and actually "formalizing" the statement will be a great feature, and will help a lot ig:)

→ Reply

AprLsity

13 months ago, # |

Amazing work!

→ Reply

lis05

13 months ago, # |

+15

Honestly, that is impressive. I wonder if the same thing could be done but with the editorials (so that people can find applications of different ideas and algorithms).

→ Reply

ankeshgupta007

13 months ago, # |

Is the link still live? not working for me

→ Reply

Misa-Misa

13 months ago, # |

+12

Wow what a wonderful work. Ask Um_nik for 1000 dollars.

→ Reply

avighnakc

13 months ago, # |

Lol, this could be used in today's contest to figure out the solution for D.

→ Reply

huikang

12 months ago, # |

← Rev. 2 →

+14

I have implemented this project on Poe, so that the cost of calling ChatGPT will be borne by the platform (i.e. other subscribers).

(Disclaimer: I currently work for Quora / Poe).

Sample query — https://poe.com/huikang/1512928000278451

After reading the code, someone could try some of these

Using better LLMs to summarize (e.g. GPT-4)
Other retrieval indexes (predicted topics, keywords)
Use an LLM to rerank the retrieved problems
Instead of a decimal similarity value, provide a LLM-generated summary on whether the two problems are the same
Craft some evaluation benchmarks

Edit (October 2024): The tool no longer works.

→ Reply

steinum

2 months ago, # ^ |

→ Reply

entropy07

12 months ago, # |

← Rev. 3 →

One idea: Use LLMs to read and write a summary for top solutions' source code of a problem and use the summary together with the problem statement for searching.

→ Reply