The primary concern of this post:
Prevent bots from scraping away data from codeforces, as it will make AI tools more powerful and harm codeforces in long term.
OpenAI just started providing services for to customise Large Language Models on your own custom data
Why is this a problem now?
Codeforces has a very rich database of community driven questions (Approx. 10,000) Now you can easily feed a lot of data of codeforces to ChatGPT and make it permanently learn the stuff. It will enhance its existing problem solving ablities of algorithmic quesitons. Codeforces has a large set of both the question and their respective tutorials.
(My opinion) Chances are that when chatGPT was being created, it was already fed the codeforces data once, which allows the model to be able to code in a manner which can solve codeforces questions. But it was not custom trained SPECIFICALLY for this, which is now possible.
Any individual of the world can now scrap entire codeforces (relatively easy task) and needs just $200 in GPT credits to custom train a model and make a new service or a product which can solve even the most difficult of the codeforces problems in no time.
How is leetcode fighting this? 1. (last saturday itself) They have implemented CloudFlare's anti-scraping on their website. Which makes it super difficult to scrape data from automatic scripts like selenium or beautiful soup.
I propose:
- Adding a service to avoid data scraping.
- Adding capcha wherever possible.