An Open-Source Retrieval Model and Service for Competitive Programming Problems

Правка en1, от cold_chair, 2025-05-21 15:54:44

https://mirror.codeforces.com/55b297/output.png

Over the past few years, the number of duplicate or nearly-duplicate problems in competitive programming has grown noticeably. You can find several discussions about this on Codeforces, for example:

To address the issue, we collected a variety of data — problem–solution pairs, duplicate-problem pairs, and pairs of full vs. simplified problem statements. With these data we built a dedicated benchmark for problem retrieval and trained a special-purpose retrieval model that currently achieves the best performance among open-source models in this domain.

You can try the model-powered search service live at http://1.94.255.218:5000/.

About two years ago, the project yuantiji.ac introduced a simplify-then-retrieve pipeline based on a closed-source LLM API. In contrast, our approach is pure retrieval: the model is lightweight, specifically tuned for the task, and averages only ≈ 0.2 s per query on GPU.

More importantly, the entire model is open-source. If you’d like to run the search service locally (CPU or GPU; ≥ 16 GB memory recommended), just follow the instructions in our repository:

https://github.com/coldchair/CPRet

(The repo also contains full training scripts, so you can fine-tune or improve the model yourself.) Local deployment is invaluable for contests that require strict data privacy, and we hope the community will use this tool to reduce the spread of duplicate problems.

We’re also building a public website to collect duplicate-problem reports. Once the site is online, please feel free to submit any duplicates you discover with our retriever.

Questions or issues? Open an issue on GitHub — we’d love to hear your feedback!

Теги duplicate problems

История

 
 
 
 
Правки
 
 
  Rev. Язык Кто Когда Δ Комментарий
en5 Английский cold_chair 2025-06-12 08:14:33 58
en4 Английский cold_chair 2025-05-21 16:00:06 152 Initial revision (published)
en3 Английский cold_chair 2025-05-21 15:56:26 79 Tiny change: 't.png)\n\n\nOver t' -> 't.png)\n\nOver t'
en2 Английский cold_chair 2025-05-21 15:56:08 8
en1 Английский cold_chair 2025-05-21 15:54:44 2189 Initial revision (saved to drafts)