Is there a way to scrape problem statements automatically?
| # | User | Rating |
|---|---|---|
| 1 | Benq | 3792 |
| 2 | VivaciousAubergine | 3647 |
| 3 | Kevin114514 | 3603 |
| 4 | jiangly | 3583 |
| 5 | turmax | 3559 |
| 6 | tourist | 3541 |
| 7 | strapple | 3515 |
| 8 | ksun48 | 3461 |
| 9 | dXqwq | 3436 |
| 10 | Otomachi_Una | 3413 |
| # | User | Contrib. |
|---|---|---|
| 1 | Qingyu | 157 |
| 2 | adamant | 153 |
| 3 | Um_nik | 146 |
| 3 | Proof_by_QED | 146 |
| 5 | Dominater069 | 145 |
| 6 | errorgorn | 141 |
| 7 | cry | 139 |
| 8 | YuukiS | 135 |
| 9 | TheScrasse | 134 |
| 10 | chromate00 | 133 |
Is there a way to scrape problem statements automatically?
| Name |
|---|



Using codeforces API — checkout the
Problemsection.Unfortunately, the Problem object does not come with the statement text.
I am not sure what you are trying to achieve but previously I was working on a similar kind of problem I used
beautifulsoupfrom python to read HTML and parse the content. you can do a similar for your purpose.I tried using soup but it doesn't work anymore. I think Codeforces upgraded their systems (currently uses some sort of script to get statements on demand? I know very little about this stuff). In fact, previously you could just use
wgetto just download a problem page, likehttps://mirror.codeforces.com/problemset/problem/1673/F, to get the raw HTML. This doesn't work anymore. In case I might be missing something trivial, could you please try using soup again – I mean, right now? I think when you did your parsing, a simplewgetcommand would've worked.Download page using the problem id?
So, a friend of mine looked into it and found out that
wget/curl https://mirror.codeforces.com/contest/contestId/problemsstill works, while the problem withwget/curl https://mirror.codeforces.com/problemset/problem/contestId/indexis that it just gives the preload HTML. So, scraping contest psets instead of individual problems is an alternative. Thanks for your comments.Unfortunately, it seems like this no longer work anymore :/ Did you manage to find any other alternative?