Is there a way to scrape problem statements automatically?
# | User | Rating |
---|---|---|
1 | tourist | 4009 |
2 | jiangly | 3823 |
3 | Benq | 3738 |
4 | Radewoosh | 3633 |
5 | jqdai0815 | 3620 |
6 | orzdevinwang | 3529 |
7 | ecnerwala | 3446 |
8 | Um_nik | 3396 |
9 | ksun48 | 3390 |
10 | gamegame | 3386 |
# | User | Contrib. |
---|---|---|
1 | cry | 167 |
2 | Um_nik | 163 |
3 | maomao90 | 162 |
3 | atcoder_official | 162 |
5 | adamant | 159 |
6 | -is-this-fft- | 158 |
7 | awoo | 157 |
8 | TheScrasse | 154 |
9 | Dominater069 | 153 |
9 | nor | 153 |
Is there a way to scrape problem statements automatically?
Name |
---|
Using codeforces API — checkout the
Problem
section.Unfortunately, the Problem object does not come with the statement text.
I am not sure what you are trying to achieve but previously I was working on a similar kind of problem I used
beautifulsoup
from python to read HTML and parse the content. you can do a similar for your purpose.I tried using soup but it doesn't work anymore. I think Codeforces upgraded their systems (currently uses some sort of script to get statements on demand? I know very little about this stuff). In fact, previously you could just use
wget
to just download a problem page, likehttps://mirror.codeforces.com/problemset/problem/1673/F
, to get the raw HTML. This doesn't work anymore. In case I might be missing something trivial, could you please try using soup again – I mean, right now? I think when you did your parsing, a simplewget
command would've worked.Download page using the problem id?
So, a friend of mine looked into it and found out that
wget/curl https://mirror.codeforces.com/contest/contestId/problems
still works, while the problem withwget/curl https://mirror.codeforces.com/problemset/problem/contestId/index
is that it just gives the preload HTML. So, scraping contest psets instead of individual problems is an alternative. Thanks for your comments.Unfortunately, it seems like this no longer work anymore :/ Did you manage to find any other alternative?