There are multiple blogs showing that some users get away with cheating by obfuscating their code, so codeforces really needs to check if a submission is obfuscated. It looks like the cheaters copy a solution and they put each line/token in a define, with a lot of junk in between. This is currently being handled on a case by case basis, which is a problem
/*Example*/
#include <iostream>
#define awhHRUTH1x using
#define b589AHWUin namespace
#define iuhA2781AB std;
#define v1895Krnba int
#define bATT68125F main(
#define bba5891AW4 ){
#define JKT15jyT51 cout<<
#define abb151A5h1 "HelloWorld\n";
#define Uikta5UUNW return
#define bbbqrgq214 0;
#define btpIOUHBWW }
awhHRUTH1x b589AHWUin iuhA2781AB
/*#define aiw uahiuwhha ruahwihaiuwhviunpawrry1pqrawu;*/
/*int gcd(int a,int b) return a+b;*/
/*im not sure if this compiles*/
v1895Krnba bATT68125F bba5891AW4 JKT15jyT51 abb151A5h1 Uikta5UUNW bbbqrgq214 btpIOUHBWW
Right now, as most of these solutions are accepted and not skipped, it seems like there's not a proper obfuscation check, so we need MikeMirzayanov to make a good one. For example, checking the code after it gets preprocessed (remove defines and comments) so that the junk won't circumvent the checks. Or counting the amount of random strings in the code (like a551hawibBY). These ideas can be made into a deobfuscator that preprocesses the code before the actual cheating check
Sure, we need sth like that but idk if it's ez to implement or will it be reliable(people use a lot of #define as well; some of them might be misidentified as "random strings").
I wonder what CP algorithms/knowledge can be utilised to implement the obfuscation checking program
I'm not sure will it work normally, but we can probably simply parse each define, and then use anti-cheat program.
Exactly, that's what I mean: first preprocess the code (parse the defines, remove the comments) then use the regular anti cheating program, so basically add a deobfuscator
Auto comment: topic has been updated by HusseinFarhat (previous revision, new revision, compare).
I think we can use LLMs like GPT-4 for these checks since there's no simple rule you can enforce, automation shouldn't be difficult to implement! GPT-4 might be costly and so you could use something like Claude.
Good idea, but the problem is that there's a lot of submissions, so to check every single one takes time, and money in the case of GPT
Time is not an issue as unlike plag check, this can be done parallelly. And the money thing depends on CF's budget although I do not think Claude should be very heavy on the wallet and should do a decent enough job if GPT-4 is costly
No, you can't use LLMs for it. If you put something like Benq template in it, it will forget most of the stuff it just processed before it even gets to the actual code. Not only that you can easily overload LLM with info, LLMs (as of now) are horrible at determining if x == y, therefore a simple plag check (after deobfuscation) will spit on LLM.
The first thing you mentioned is an engineering problem which is solvable. one idea is breaking the codes into chunks and linking them through a series of prompts. Haven't thought through completely but I'm sure its solvable.
And regarding the second point, I never talked about using LLMs for plag checking. Only to check if there is obfuscation or not, since even if the original solution is not a copied one, obfuscation is still against the rules so flagging it should be enough and an LLM can do a decent job imo.
While parsing all the defines and removing all the comments will skip a lot of solutions, we are on the losing side of the war.
First, you can obfuscate variables and the "deobfuscating" of them seems non-trivial (although, you can just give them all the same name, and the anti plag will check the similarity of the structure of code, which is good enough?)
Second, you can put a lot of useless lines of code as an another way to obfuscate the code. Battling that is even harder: you can probably check for code that never executes on testcases, removing that, and then running plag check. But if they actually get executed (like suppose putting in "i++;i--;i++;i--;" 50 times), I guess the only saving grace is looking at optimized assembly code from GCC, which is kinda futile, as you can easily influence compiler output with your code, which will kill the plag check.
From this comes the nail in the coffin being assembly inline. You can literally just compile your program in assembly file and copypaste it in the inline, absolutely skipping the plag check. I couldn't think of a way to get around that, but maybe someone, who is smarter, will suggest a good idea for this.
You can also use
ifdef
with__TIME__
/etc and get a completely different (probably incorrect) solution 20-30 minutes after the contest (here you have to believe that the system tests will launch quickly).de-obfuscating is actually a hard thing to do and it's kinda impossible if it done with some smart apps like stunnix (except if you're Chinese:>). You can read about code-obfuscation and its history if you don't believe me.
I think the current best strategy is to reporting these solutions by newbies who are very angry that people are getting rate by cheating.
Perform control-flow graph analysis? This method will likely fail for short DP / math problems, but I think the more complex it gets, the more reliable CFG analysis becomes.
That would lead to too many false positives, and it's easy to obfuscate that by switching 2 if statements for example