I'm so pleased — I have first great cheater at my site CodeAbbey. Well... damn! This fellow was not lazy — he copied and slightly refactored about 125 solutions (I believe, shared by his colleague) — and then asked for certificate. Why he want it — no idea! It is not certificate by Oracle or Microsoft. Anyway he spent about 3 weeks and see how it looks like:
source pretending to look different
(And probably it is only my imagination — their similarity?)
Well, jokes aside — I have a trouble — this time it was easy enough to find cheating since there are not too many users — and only few dozens top-users. But if such cases will repeat when there are more people... I may want some automated comparison of sources.
Current idea — I can calculate certain metrics / hashes over the solution and save them along with it. They can be, well, amounts of puncuation symbols, operator symbols etc. And when I need to check the user — we would find few with the closest set of metrics. Something like Locality-Sensitive Hashing. But I before trying to experiment with this blindly I decided to ask clever community — perhaps some people already have worked with similar tasks and could hint on some good ideas to look at / learn from?
I believe, for example, that CF administration have their own tools — but as one wise colleague suggested, they probably are not going to share their know-how to avoid sharing secrets with cheaters at the end... :o
Auto comment: topic has been translated by RodionGork(original revision, translated revision, compare)
Wow — autocomment is a cool feature but seems I could not remove it now...
This paper: Winnowing: Local Algorithms for Document Fingerprinting
might be useful for you.
For its application, see Moss — A system for software plagiarism
Thank you! Great hint! I suspected that some people already worked on similar problem — and here it is — and interesting article by the way... :)
Another tool: https://github.com/jplag/jplag