GUM's blog

By GUM, 6 months ago, In English

Note: This post continues from my previous blog, JudgeBan, which introduced a lightweight system for handling suspicious contestants during programming contests. Here, I’ll describe a complementary, community-driven system called CommunityJudge — a voluntary way for trusted participants to help detect cheating.


TL;DR

Add a third participation mode when registering for a contest: Rated, Unrated, or Judge (Community Judge).

Judges voluntarily review suspicious submissions flagged by the system (for example, by JudgeBan). Only high-rated users (1900+ for non–Div.1 contests, 2400+ for Div.1) can register as judges. Each judge receives anonymized data about suspicious contestants, compares their performance to previous contests, and votes: Cheater / Not Sure / Legit.

Their votes are aggregated with the automated system’s confidence score. Judges earn a new Judge Rating based on how well their decisions match final outcomes. Judges with high Judge Rating gain more weight in future decisions, while those with poor accuracy lose eligibility.


The idea, step by step

1. Judge registration

When users register for a contest, they see three options:

  • Rated
  • Unrated
  • Judge — review suspicious cases

Eligibility:

  • To judge Div.2 or lower contests → rating ≥ 1900
  • To judge Div.1 contests → rating ≥ 2400

This ensures judges have solid contest experience and understand what realistic performance looks like.


2. Receiving suspicious cases

When a contestant is flagged by JudgeBan (based on suspicious timing, unusual accuracy, or reuse patterns), their data is queued for post-contest human verification.

Each Community Judge receives:

  • The flagged user’s current contest submissions (source code + timestamps)
  • A statistical summary of their previous 8 contests:
  • Problems solved per contest

  • Submission count
  • Accuracy ratio
  • Difficulty distribution
  • Submission timing behavior

All data is anonymized: no usernames or identifying metadata are visible to the judges.


3. Judging process

For each flagged contestant, the judge chooses one of three verdicts:

  • Cheater
  • Not sure
  • Legit

Each case is reviewed independently by multiple judges. Their decisions are then combined with the automated detection system’s confidence score to form a final confidence value.


4. Weighted decision mechanism

Each judge’s vote is weighted by their Judge Rating (JR). The final decision confidence is computed as:

FinalConfidence = (AIConfidence × α) + (WeightedAverageJudgeScore × β)

where α and β are adjustable parameters depending on trust in the model vs. community.

Judges with higher JR have higher influence. Over time, the system learns to trust reliable judges more.


5. Judge Rating system

Every judge has a Judge Rating — a separate rating system parallel to contest rating. After each case is resolved (when the final decision is known):

  • Judges whose verdict matches the final outcome gain JR.
  • Judges whose verdict contradicts the final outcome lose JR.

Key rules:

  • If a judge’s JR drops below a threshold (e.g., 0 or -100), they lose the right to judge.
  • Judges with high JR receive more weight in decision aggregation and can access higher-tier reviews.
  • The JR update magnitude depends on both accuracy and confidence (judges who marked “Not sure” lose or gain less).

This creates a feedback loop that rewards consistency and fairness.


6. Integration with reporting system

The Judge Rating can also be used outside contests:

  • Reports or flags submitted by high-JR users are prioritized for moderator review.
  • Their feedback carries more credibility in the queue.
  • Over time, this could power a trustworthy, semi-automated reputation-based moderation system.

Why this helps

  • Leverages expertise: High-rated players can intuitively detect unrealistic behavior that automated systems might miss.
  • Distributed verification: Reduces the workload on admins and ensures scalability as contest participation grows.
  • Reputation-based fairness: Judges earn or lose trust based on past accuracy, discouraging careless or biased voting.
  • Transparency & engagement: Experienced players get to contribute constructively to maintaining contest integrity.

Safeguards and fairness

  • Anonymization: Judges never see usernames or personal data — only behavioral statistics and submissions.
  • Multiple reviewers per case: Single biases or mistakes can’t decide outcomes.
  • Cross-validation with AI: Final outcomes depend on both human and automated inputs.
  • Rate limits: Judges can handle only a few cases per round to prevent burnout or rushed decisions.
  • Appeal process: As with JudgeBan, users can appeal outcomes with transparent evidence review.

Risks & mitigations

Risk Mitigation
Biased judging Anonymous data + multi-judge averaging
False consensus Blend with AI confidence scores
Low participation Offer small JR rewards or community recognition
System gaming JR decay + accuracy tracking prevents mass bias
Complexity Integrate smoothly with JudgeBan pipeline and moderation tools

Implementation notes

  • Extend the JudgeBan to automatically forward anonymized data of flagged users to available judges.
  • Build a Judge Dashboard with per-case voting, evidence summary, and historical accuracy stats.
  • Maintain JR scores, decay them over time, and use them in aggregation formulas.
  • Log all actions for transparency and periodic audits.

Conclusion

CommunityJudge turns skilled competitors into community moderators — a collaborative defense line against cheating. Combined with JudgeBan, it creates a layered ecosystem:

JudgeBan detects → CommunityJudge verifies → Permanent ban or clearance follows.

This dual system balances automation with human insight, rewards trustworthy judges, and helps keep contests fair without punishing honest players.

Full text and comments »

  • Vote: I like it
  • +1
  • Vote: I do not like it

By GUM, 6 months ago, In English

Note: I’d like to propose a lightweight, practical mechanism to reduce cheating during contests. I call it JudgeBan.

TL;DR

When an account looks even slightly suspicious of using external tools or otherwise cheating during a contest, put it into a temporary JudgeBan state: their submissions are not judged during the contest (they are tested only after the contest). After the contest, evaluate the account’s behavior across the next 3–5 contests using simple metrics. If behaviour returns to normal, unban; otherwise turn the ban into a permanent rated ban. Suspicious accounts can choose to make the contested round unrated (with an instant JudgeBan) or keep it rated without the instant protection. Accounts that don’t participate in any rated contests for 3 months after JudgeBan should be fully banned. Finally, permanent cheating bans should be rated bans with rating reset to zero — a stronger and clearer deterrent.


The idea, step by step

  1. Detect suspicion
  • Use existing detectors (anomaly in timing, reuse of exact solutions, suspicious sharing patterns, tool traces, reports, etc.). If an account is flagged as even slightly suspicious, apply JudgeBan.
  1. Apply JudgeBan during the contest
  • Submissions from JudgeBanned accounts are recorded but not judged live — they won’t receive Accepted/Wrong Answer/Compile Error/Time Limited verdicts during the contest and thus can’t benefit from iterative, realtime feedback.
  • Their runs are queued and executed in the testing system after the contest ends. This prevents interactive “trial-and-error” advantage while preserving the ability to analyze the code.
  1. Post-contest evaluation window
  • After the contest, evaluate the account across the next 3–5 contests. Compare a few simple aggregate metrics from before and after the JudgeBan:

    • average number of Accepted problems,
    • average number of Wrong Answer / Time Limited runs,
    • total number of submissions,
    • success rate (Accepted problems/submissions),
    • problem difficulties attempted.
  • If the post-JudgeBan averages are roughly similar to pre-JudgeBan levels, clear the ban (un-JudgeBan).
  • If performance remains anomalously high (or patterns suggest continued cheating), escalate to a permanent rated ban.
  1. User options at time of JudgeBan
  • A JudgeBanned user may choose:

    • Make that specific contest unrated (accept JudgeBan for the contest), or
    • Keep it rated but accept that JudgeBan will remain in place (no instant verdicts).
  • This choice gives the user agency: they can opt-out temporarily and not risk rating issues while being investigated.
  1. Inactivity rule
  • If a JudgeBanned account does not participate in any rated contests for three months, consider the account permanently banned (no longer allowed to take rated contests).
  1. Permanent cheating bans = rated bans
  • When a user is permanently banned for cheating, make the ban a rated ban and set their rating to zero. This makes consequences meaningful and visible, which helps deterrence.

Why this could help

  • Removes the main incentive of realtime cheating: live verdicts and rapid trial-and-error feedback are a huge advantage for tool-assisted cheating. JudgeBan removes that advantage without immediately destroying the account.
  • Preserves evidence: running submissions after the contest preserves artifacts for later inspection (source, exact outputs, timestamps).
  • Fairness and proportionality: JudgeBan is an intermediate step — it’s less blunt than an immediate permanent ban and allows for recovery if the initial suspicion was a false positive.
  • Deterrence through escalation: users who try to continue cheating will find that post-contest analysis + follow-up monitoring can escalate to a meaningful rated ban.

Potential thresholds & metrics (suggested)

  • Evaluate changes in the following per-contest averages:

  • Accepted problems per contest

  • Submissions per contest
  • Accepted/submission ratio
  • Use a comparison window (e.g., compare last 5 contests before the JudgeBan vs next 3–5 contests after).
  • Define a tolerance band (e.g., if the post-ban Accepted rate is within ±20% of pre-ban average → likely normal; if it remains > +50% or other suspicious shape → escalate).

(Exact thresholds should be tuned with historical data and checked for fairness to avoid false positives.)


Safeguards and fairness

  • Human review for edge cases: automated steps should trigger human review before a permanent rated ban is applied.
  • Appeal process: provide a transparent appeal path and make evidence available to the user (logs, sample runs, timestamps) while protecting privacy of others.
  • False positive control: require multiple signals before JudgeBan is applied (to avoid punishing legitimate users for one noisy signal).
  • Rate-limit JudgeBan usage: don’t overuse — only for accounts with credible suspicion.

Risks & challenges

  • False positives: legitimate competitors might be slowed or offended if a JudgeBan is applied mistakenly. Strong safeguards and quick appeals/clear communication are essential.
  • Gaming the system: cheaters may adapt (e.g., reduce activity for 3 months to avoid the inactivity rule), so monitoring policies need to be robust and iteratively improved.
  • Community perception: any temporary withholding of verdicts feels heavy-handed. Clear messaging (“You’ve been temporarily put under review — submissions will be tested after the round”) will help.
  • Implementation complexity: the contest infrastructure would need to support queuing judged runs and linking them to post-contest review workflows.

Implementation notes (practical)

  • Add a contest middleware layer that can toggle per-account live-judging on/off.
  • Store submission metadata and results for forensic analysis.
  • Build a dashboard for moderators showing before/after metrics and the most relevant signals for each flagged account.
  • Log every step and provide automated summaries to human reviewers to speed decisions.

Additional note

I also have an idea for an easier and more engaging way to detect cheaters — a voluntary community-based approach, which I’ll describe in my next blog.

Update: The follow-up idea has now been published — read it here: CommunityJudge — a voluntary peer review system for fairer contests


Conclusion

JudgeBan is a middle-ground approach: it neutralizes the live feedback advantage of cheating tools, preserves evidence for post-contest review, and creates a structured pathway to escalate to permanent rated bans if suspicious behavior persists. With careful thresholds, human oversight, and a fair appeal process, JudgeBan could reduce cheating while minimizing harm to innocent users.

Full text and comments »

  • Vote: I like it
  • +17
  • Vote: I do not like it