Hello, Codeforces.↵
↵
I've participated a few rounds and noticed that there are **too many cheaters**.↵
Now the cheater detection is community-driven and only a few of cheaters are being detected.↵
↵
###Idea↵
I’m proposing Codeforces Anti‑Cheat (CFAC) – an automated flagging system that works after each contest and automatically detects cheaters using:↵
↵
— **NLP-model based submission (and maybe replacement) checking**↵
↵
— **Timings-based detection: if gray solves div.2 e in 3 mins, its suspicious**↵
↵
all of these metrics are combined into suspicion score matrix where score[u][p] is value↵
normalized [-1, 1] where ↵
↵
— -1 — if participant $u$ 100% not cheating at problem $p$;↵
↵
— 1 — if participant $u$ 100% cheating at problem $p$;↵
↵
### Need help↵
I need help in↵
↵
— collecting labelled data for cheater's code↵
↵
— final testing of anti-cheat system↵
↵
### My review on my NLP-based model↵
It works pretty well, but it can detect only well-LLMed submissions like that:↵
↵
<spoiler summary="Submission 1">↵
```↵
import sys↵
↵
def solve() -> None:↵
it = iter(sys.stdin.read().strip().split())↵
t = int(next(it))↵
out_lines = []↵
for _ in range(t):↵
n = int(next(it))↵
q = int(next(it))↵
a = [int(next(it)) for _ in range(n)]↵
b = [int(next(it)) for _ in range(n)]↵
# c[i] = max(a[i], b[i])↵
c = [max(ai, bi) for ai, bi in zip(a, b)]↵
# suffix maxima M[i] = max_{j>=i} c[j]↵
M = [0] * n↵
M[-1] = c[-1]↵
for i in range(n-2, -1, -1):↵
M[i] = max(c[i], M[i+1])↵
# prefix sums of M↵
pref = [0] * (n + 1)↵
for i in range(n):↵
pref[i+1] = pref[i] + M[i]↵
# answer queries↵
ans = []↵
for __ in range(q):↵
l = int(next(it))↵
r = int(next(it))↵
ans.append(str(pref[r] - pref[l-1]))↵
out_lines.append(" ".join(ans))↵
sys.stdout.write("\n".join(out_lines))↵
↵
if __name__ == "__main__":↵
solve()↵
```↵
</spoiler>↵
↵
<spoiler summary="Submission 2">↵
```↵
import sys↵
↵
# Function to calculate the sum of digits of a number↵
def get_digit_sum(n):↵
s = 0↵
while n > 0:↵
s += n % 10↵
n //= 10↵
return s↵
↵
def solve():↵
# Read all input from standard input↵
input_data = sys.stdin.read().split()↵
↵
if not input_data:↵
return↵
↵
iterator = iter(input_data)↵
try:↵
# First token is the number of test cases↵
t = int(next(iterator))↵
except StopIteration:↵
return↵
↵
results = []↵
↵
for _ in range(t):↵
try:↵
x = int(next(iterator))↵
except StopIteration:↵
break↵
↵
count = 0↵
# We are looking for y such that y - d(y) = x.↵
# This can be rewritten as y = x + d(y).↵
# Let s = d(y). Then y = x + s.↵
# We need to check if d(x + s) == s.↵
# Since x <= 10^9, y is roughly 10^9.↵
# The maximum sum of digits for a number <= 10^9 + 100 is 81 (for 999,999,999).↵
# Thus, s will not exceed 90. We iterate s from 1 to 100 to be safe.↵
↵
for s in range(1, 100):↵
y = x + s↵
if get_digit_sum(y) == s:↵
count += 1↵
↵
results.append(str(count))↵
↵
# Print all results separated by newlines↵
print('\n'.join(results))↵
↵
if __name__ == '__main__':↵
solve()↵
```↵
</spoiler>↵
↵
**Why it isnt working well?**:↵
↵
- because my AI-generated samples were very-very simple to detect↵
↵
- because some LLMish things can be too difficult do detect using only CodeBERT-generated embeddings↵
↵
As solution I will start everything from scratch to make my model detect more AI landmarks which are hard to see through embeddings↵
↵
###Updates↵
- Created cfac [repo on github](https://github.com/vn4ka/cfac)↵
- Updated post text without AI addressing hate comments about AI-slop and [user:pilliamw,2026-03-24] blog post↵
- **Major update**: (finally) trained a model for classifying cheaters/not cheaters (not pushed changes to repo yet)↵
↵
I've participated a few rounds and noticed that there are **too many cheaters**.↵
Now the cheater detection is community-driven and only a few of cheaters are being detected.↵
↵
###Idea↵
I’m proposing Codeforces Anti‑Cheat (CFAC) – an automated flagging system that works after each contest and automatically detects cheaters using:↵
↵
— **NLP-model based submission (and maybe replacement) checking**↵
↵
— **Timings-based detection: if gray solves div.2 e in 3 mins, its suspicious**↵
↵
all of these metrics are combined into suspicion score matrix where score[u][p] is value↵
normalized [-1, 1] where ↵
↵
— -1 — if participant $u$ 100% not cheating at problem $p$;↵
↵
— 1 — if participant $u$ 100% cheating at problem $p$;↵
↵
### Need help↵
I need help in↵
↵
— collecting labelled data for cheater's code↵
↵
— final testing of anti-cheat system↵
↵
### My review on my NLP-based model↵
It works pretty well, but it can detect only well-LLMed submissions like that:↵
↵
<spoiler summary="Submission 1">↵
```↵
import sys↵
↵
def solve() -> None:↵
it = iter(sys.stdin.read().strip().split())↵
t = int(next(it))↵
out_lines = []↵
for _ in range(t):↵
n = int(next(it))↵
q = int(next(it))↵
a = [int(next(it)) for _ in range(n)]↵
b = [int(next(it)) for _ in range(n)]↵
# c[i] = max(a[i], b[i])↵
c = [max(ai, bi) for ai, bi in zip(a, b)]↵
# suffix maxima M[i] = max_{j>=i} c[j]↵
M = [0] * n↵
M[-1] = c[-1]↵
for i in range(n-2, -1, -1):↵
M[i] = max(c[i], M[i+1])↵
# prefix sums of M↵
pref = [0] * (n + 1)↵
for i in range(n):↵
pref[i+1] = pref[i] + M[i]↵
# answer queries↵
ans = []↵
for __ in range(q):↵
l = int(next(it))↵
r = int(next(it))↵
ans.append(str(pref[r] - pref[l-1]))↵
out_lines.append(" ".join(ans))↵
sys.stdout.write("\n".join(out_lines))↵
↵
if __name__ == "__main__":↵
solve()↵
```↵
</spoiler>↵
↵
<spoiler summary="Submission 2">↵
```↵
import sys↵
↵
# Function to calculate the sum of digits of a number↵
def get_digit_sum(n):↵
s = 0↵
while n > 0:↵
s += n % 10↵
n //= 10↵
return s↵
↵
def solve():↵
# Read all input from standard input↵
input_data = sys.stdin.read().split()↵
↵
if not input_data:↵
return↵
↵
iterator = iter(input_data)↵
try:↵
# First token is the number of test cases↵
t = int(next(iterator))↵
except StopIteration:↵
return↵
↵
results = []↵
↵
for _ in range(t):↵
try:↵
x = int(next(iterator))↵
except StopIteration:↵
break↵
↵
count = 0↵
# We are looking for y such that y - d(y) = x.↵
# This can be rewritten as y = x + d(y).↵
# Let s = d(y). Then y = x + s.↵
# We need to check if d(x + s) == s.↵
# Since x <= 10^9, y is roughly 10^9.↵
# The maximum sum of digits for a number <= 10^9 + 100 is 81 (for 999,999,999).↵
# Thus, s will not exceed 90. We iterate s from 1 to 100 to be safe.↵
↵
for s in range(1, 100):↵
y = x + s↵
if get_digit_sum(y) == s:↵
count += 1↵
↵
results.append(str(count))↵
↵
# Print all results separated by newlines↵
print('\n'.join(results))↵
↵
if __name__ == '__main__':↵
solve()↵
```↵
</spoiler>↵
↵
**Why it isnt working well?**:↵
↵
- because my AI-generated samples were very-very simple to detect↵
↵
- because some LLMish things can be too difficult do detect using only CodeBERT-generated embeddings↵
↵
As solution I will start everything from scratch to make my model detect more AI landmarks which are hard to see through embeddings↵
↵
###Updates↵
- Created cfac [repo on github](https://github.com/vn4ka/cfac)↵
- Updated post text without AI addressing hate comments about AI-slop and [user:pilliamw,2026-03-24] blog post↵
- **Major update**: (finally) trained a model for classifying cheaters/not cheaters (not pushed changes to repo yet)↵




