TheeLooser's blog

By TheeLooser, history, 3 years ago, In English

Is there a way to scrape problem statements automatically?

  • Vote: I like it
  • +2
  • Vote: I do not like it

| Write comment?
»
3 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Using codeforces API — checkout the Problem section.

  • »
    »
    3 years ago, # ^ |
      Vote: I like it +3 Vote: I do not like it

    Unfortunately, the Problem object does not come with the statement text.

    • »
      »
      »
      3 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      I am not sure what you are trying to achieve but previously I was working on a similar kind of problem I used beautifulsoup from python to read HTML and parse the content. you can do a similar for your purpose.

      • »
        »
        »
        »
        3 years ago, # ^ |
          Vote: I like it +3 Vote: I do not like it

        I tried using soup but it doesn't work anymore. I think Codeforces upgraded their systems (currently uses some sort of script to get statements on demand? I know very little about this stuff). In fact, previously you could just use wget to just download a problem page, like https://mirror.codeforces.com/problemset/problem/1673/F, to get the raw HTML. This doesn't work anymore. In case I might be missing something trivial, could you please try using soup again – I mean, right now? I think when you did your parsing, a simple wget command would've worked.

    • »
      »
      »
      3 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Download page using the problem id?

»
3 years ago, # |
  Vote: I like it 0 Vote: I do not like it

So, a friend of mine looked into it and found out that wget/curl https://mirror.codeforces.com/contest/contestId/problems still works, while the problem with wget/curl https://mirror.codeforces.com/problemset/problem/contestId/index is that it just gives the preload HTML. So, scraping contest psets instead of individual problems is an alternative. Thanks for your comments.

  • »
    »
    3 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Unfortunately, it seems like this no longer work anymore :/ Did you manage to find any other alternative?