Note: This post continues from my previous blog, JudgeBan, which introduced a lightweight system for handling suspicious contestants during programming contests. Here, I’ll describe a complementary, community-driven system called CommunityJudge — a voluntary way for trusted participants to help detect cheating.
TL;DR
Add a third participation mode when registering for a contest: Rated, Unrated, or Judge (Community Judge).
Judges voluntarily review suspicious submissions flagged by the system (for example, by JudgeBan). Only high-rated users (1900+ for non–Div.1 contests, 2400+ for Div.1) can register as judges. Each judge receives anonymized data about suspicious contestants, compares their performance to previous contests, and votes: Cheater / Not Sure / Legit.
Their votes are aggregated with the automated system’s confidence score. Judges earn a new Judge Rating based on how well their decisions match final outcomes. Judges with high Judge Rating gain more weight in future decisions, while those with poor accuracy lose eligibility.
The idea, step by step
1. Judge registration
When users register for a contest, they see three options:
- Rated
- Unrated
- Judge — review suspicious cases
Eligibility:
- To judge Div.2 or lower contests → rating ≥ 1900
- To judge Div.1 contests → rating ≥ 2400
This ensures judges have solid contest experience and understand what realistic performance looks like.
2. Receiving suspicious cases
When a contestant is flagged by JudgeBan (based on suspicious timing, unusual accuracy, or reuse patterns), their data is queued for post-contest human verification.
Each Community Judge receives:
- The flagged user’s current contest submissions (source code + timestamps)
- A statistical summary of their previous 8 contests:
Problems solved per contest
- Submission count
- Accuracy ratio
- Difficulty distribution
- Submission timing behavior
All data is anonymized: no usernames or identifying metadata are visible to the judges.
3. Judging process
For each flagged contestant, the judge chooses one of three verdicts:
- Cheater
- Not sure
- Legit
Each case is reviewed independently by multiple judges. Their decisions are then combined with the automated detection system’s confidence score to form a final confidence value.
4. Weighted decision mechanism
Each judge’s vote is weighted by their Judge Rating (JR). The final decision confidence is computed as:
FinalConfidence = (AIConfidence × α) + (WeightedAverageJudgeScore × β)
where α and β are adjustable parameters depending on trust in the model vs. community.
Judges with higher JR have higher influence. Over time, the system learns to trust reliable judges more.
5. Judge Rating system
Every judge has a Judge Rating — a separate rating system parallel to contest rating. After each case is resolved (when the final decision is known):
- Judges whose verdict matches the final outcome gain JR.
- Judges whose verdict contradicts the final outcome lose JR.
Key rules:
- If a judge’s JR drops below a threshold (e.g., 0 or -100), they lose the right to judge.
- Judges with high JR receive more weight in decision aggregation and can access higher-tier reviews.
- The JR update magnitude depends on both accuracy and confidence (judges who marked “Not sure” lose or gain less).
This creates a feedback loop that rewards consistency and fairness.
6. Integration with reporting system
The Judge Rating can also be used outside contests:
- Reports or flags submitted by high-JR users are prioritized for moderator review.
- Their feedback carries more credibility in the queue.
- Over time, this could power a trustworthy, semi-automated reputation-based moderation system.
Why this helps
- Leverages expertise: High-rated players can intuitively detect unrealistic behavior that automated systems might miss.
- Distributed verification: Reduces the workload on admins and ensures scalability as contest participation grows.
- Reputation-based fairness: Judges earn or lose trust based on past accuracy, discouraging careless or biased voting.
- Transparency & engagement: Experienced players get to contribute constructively to maintaining contest integrity.
Safeguards and fairness
- Anonymization: Judges never see usernames or personal data — only behavioral statistics and submissions.
- Multiple reviewers per case: Single biases or mistakes can’t decide outcomes.
- Cross-validation with AI: Final outcomes depend on both human and automated inputs.
- Rate limits: Judges can handle only a few cases per round to prevent burnout or rushed decisions.
- Appeal process: As with JudgeBan, users can appeal outcomes with transparent evidence review.
Risks & mitigations
| Risk | Mitigation |
|---|---|
| Biased judging | Anonymous data + multi-judge averaging |
| False consensus | Blend with AI confidence scores |
| Low participation | Offer small JR rewards or community recognition |
| System gaming | JR decay + accuracy tracking prevents mass bias |
| Complexity | Integrate smoothly with JudgeBan pipeline and moderation tools |
Implementation notes
- Extend the JudgeBan to automatically forward anonymized data of flagged users to available judges.
- Build a Judge Dashboard with per-case voting, evidence summary, and historical accuracy stats.
- Maintain JR scores, decay them over time, and use them in aggregation formulas.
- Log all actions for transparency and periodic audits.
Conclusion
CommunityJudge turns skilled competitors into community moderators — a collaborative defense line against cheating. Combined with JudgeBan, it creates a layered ecosystem:
JudgeBan detects → CommunityJudge verifies → Permanent ban or clearance follows.
This dual system balances automation with human insight, rewards trustworthy judges, and helps keep contests fair without punishing honest players.







