Hi,
Myself together with a small team of researchers are trying to teach machines solve competitive programming problems.
For that we need a large dataset of problems and solutions.
We have already crawled practically all websites that have public solutions, and are now trying to crawl solutions that are not public.
If you were solving problems from Timus, UVa or any other platform with private submissions, and are open to giving us access to your account to crawl the solutions, please send your credentials to me via a personal message. It will help us a lot with our research. The language in which you were solving problems doesn't matter.
We won't publish your code anywhere, and won't use the credentials in any way except for crawling the solutions.
Besides that a reminder that we have a labeling platform where we are trying to rewrite competitive programming problem statements in a short concise way. We pay for doing it, and many people who are presently helping us are making $12/hour. The link to the platform is
It is a very nice way to get extra income for people who can't have a full-time job due to practicing for the upcoming competitions or studying.
That's a very bold thing to do, wish you luck, but I (and many other people) think that science is not ready for such thing. But fortune favours the brave.
Yes, people are divided into those who think it's a decade away and those who think it is around the corner. I belong to the latter group :)
Both me and a friend of mine have left our day time jobs to work on this full time, so we are quite invested in our belief :)
Does UVa even store solutions?
You are right, apparently it doesn't.
We haven't found any UVa accounts yet, so it didn't come up before.
I just registered in the platform and I see that we have to win points on it. So more points means more payment or its just a sort of recruitment test?
It's directly proportional to the payment, 20K points = $20
Do you crawl only the AC solutions? Have you crawled on platforms like SPOJ? They don't seem to have public solutions.
Yes, SPOJ doesn't have public solutions, so we need access to individual people accounts to crawl them.
You may ask SPOJ folks about possible access to solutions to you for research purpose. They might allow it. I have set around 40 problems on SPOJ. I will try to ask them whether I am allowed to crawl/store those submissions.
I am working on a messenger bot. Please send me your facebook account's details (e-mail, password, etc), I won't publish them.
Many solutions to UvA problems are available on github, so you could crawl that for solutions, though I don't know how you could verify them for ACness.
Anyway, good luck with your research.
Why not share a script that allows you to scrape your AC solutions from the respective judges and upload them to your website instead of asking users to share their passwords, something that most people likely won't do?
I wanted originally to publish the crawlers and do it the way you described, but in year 2017 it's very hard to distribute code that is supposed to be ran locally. People have very different setups.
Out of curiosity, what would stop you from sharing your Timus account?
A possible way that this could harm a user is this:
Let's assume that somebody has similar passwords for e.g. VK and Timus. If they don't change their password on Timus before sharing it, they are opening up a potential attack vector for their other accounts if your database gets hacked.
This is a valid point.
However, anyone who would spend effort to download a crawler and run it on their machine would probably also be willing to spend time to change their password.
Would you mind changing your password on Timus and sharing your account with me? :)
curl | sh
is a really easy distribution method (even though still unsafe) for pretty much any programmer running a Unix-like OS, andiex (New-Object System.Net.WebClient).DownloadString('http://domain/script.ps1')
is a similar alternative for users using Windows, so you actually don't have to spend much effort to download a crawler.Assuming that your crawler doesn't have many dependencies, this should work immediately.
I personally wound be way more hesitant to run
curl | sh
than to share my Timus account :)Also, on the
curl | sh
:https://www.idontplaydarts.com/2016/04/detecting-curl-pipe-bash-server-side/
There is a small suggestion from my side here. If at all, at some point during your research you come to the conclusion that the task at hand seems intractable, you could consider a relatively easier problem of predicting tags for a question. Tags could be like "Segment trees", "DP" ,"Math" etc. and the features could be the constraints along with any other information you could extract using NLP.
Are you using NLP to solve the CP problems?? However, that's a great breakthrough man!! Keep working on it! :)
Is the website still working? I keep getting 504 Gateway Time-out.