Recently, I have seen a lot of blogs talking about the issues of cheaters. Therefore, I am currently thinking about using automatic system to catch them.
Currently, the most well-known automatic system for assisting of detecting plagiarism is MOSS(from Stanford). At first, I asked myself, why did not Codeforces use them? However, I look at the number of participants of each contests; it turns out that the count is approximately under 30000. So, we have to compare $$$4.5*10^8$$$ pairs of source code!
Assuming that the system can check $$$10^4$$$ pairs per second, we will need $$$45000$$$ seconds, which is just more than half a day, the same length as hacking procedure of Educational Rounds. But I believe that limit is much lower (I have not used it).
Is there any assistance like that could run that fast, if not MOSS? Is there any solutions that can drop the complexity of $$$O(n^2 * t)$$$? (assuming $$$t$$$ is the time for comparing a pair of code)
we can do the same thing on any random contest from any two month period, where we will decrease cutoff of similarity, so that more persons could get caught.
technically we can make relation tree of variables, like now you can do some automaton or suffix sorting kind of thing to make smaller groups, by neglecting those pairs which will definitely differ in code perspective.
also, i guess moss is system only for text based comparison, does it also compares machine level code??