Блог пользователя TheeLooser

Автор TheeLooser, история, 3 года назад, По-английски

Is there a way to scrape problem statements automatically?

  • Проголосовать: нравится
  • +2
  • Проголосовать: не нравится

»
3 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Using codeforces API — checkout the Problem section.

  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится +3 Проголосовать: не нравится

    Unfortunately, the Problem object does not come with the statement text.

    • »
      »
      »
      3 года назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      I am not sure what you are trying to achieve but previously I was working on a similar kind of problem I used beautifulsoup from python to read HTML and parse the content. you can do a similar for your purpose.

      • »
        »
        »
        »
        3 года назад, # ^ |
          Проголосовать: нравится +3 Проголосовать: не нравится

        I tried using soup but it doesn't work anymore. I think Codeforces upgraded their systems (currently uses some sort of script to get statements on demand? I know very little about this stuff). In fact, previously you could just use wget to just download a problem page, like https://mirror.codeforces.com/problemset/problem/1673/F, to get the raw HTML. This doesn't work anymore. In case I might be missing something trivial, could you please try using soup again – I mean, right now? I think when you did your parsing, a simple wget command would've worked.

    • »
      »
      »
      3 года назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      Download page using the problem id?

»
3 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

So, a friend of mine looked into it and found out that wget/curl https://mirror.codeforces.com/contest/contestId/problems still works, while the problem with wget/curl https://mirror.codeforces.com/problemset/problem/contestId/index is that it just gives the preload HTML. So, scraping contest psets instead of individual problems is an alternative. Thanks for your comments.

  • »
    »
    2 года назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    Unfortunately, it seems like this no longer work anymore :/ Did you manage to find any other alternative?