Sunb1m's blog

By Sunb1m, 12 months ago, translation, In English

Preface

Recently, numerous methods, techniques, and suggestions have been proposed to combat cheaters, providing short-term positive results. However, in the long run, cheaters become smarter and inevitably find ways to circumvent protections and continue their deceptive activities on the platform.

Why Common Methods Fail in the Long Term

Anti-Plagiarism and Timings

Code rapidly generated by Large Language Models (LLMs) often appears human-like due to standardized coding styles, predefined function templates, abbreviated variable names, etc. Additionally, appropriate delays between submissions can be easily simulated by monitoring the timing of other participants.

SMS Verification

Online services offering disposable phone numbers or additional virtual numbers provided by telecom operators render SMS verification largely ineffective. Anyone active on platforms like CodeForces or having even a minimal interest in bypassing verification knows about inexpensive online services providing temporary SMS activation or voice verification. Such services cost mere pennies, allowing number rentals for specific periods or the issuance of additional numbers linked to a primary SIM card (e.g., as offered by some Russian operators like MegaFon for a negligible daily fee). Thus, telephone-based verification is currently insufficient to deter determined cheaters.

Strict Bans

A user losing rating early on can simply create a new account, resetting the cycle. Those with established high ratings, familiar with key detection metrics, will easily evade bans by staying cautious.

Main Idea – Don't Ban, Conceal!

The idea of introducing a reporting system has been suggested previously, but I want to emphasize it further. Let's implement a delayed reporting system and likes from trusted users (elo rating >2000), influencing a suspect’s trust factor. Rather than banning a user for suspicious activity during a round—thus indicating exactly where and how they slipped—we should let them believe they've successfully evaded detection again. Meanwhile, their account receives a hidden mark, and all subsequent competitions involving that user are segregated from the main participant pool. The cheater continues to see their supposed rating and rank, although their scores are excluded from the main leaderboard calculations affecting honest competitors. Suspect participants are shifted into a "hidden pool," competing solely against each other.

Trusted Reporting System

Who can report?

  • Participants with a rating above 2000

  • Users significantly contributing to the platform’s community

  • Trusted participants and coordinators

Report Form:

  1. Suspicion of AI usage? [Yes/No]
  2. Key anomalies (timings, coding style, atypical solutions)
  3. Brief comment

Weighting of Reports:

Reporters are assigned a "weight" based on their current rating, registration date, community contribution, and past report accuracy. The higher these metrics, the greater the credibility assigned. Consequently, reports against highly trusted users don't significantly impact their reputation unless refuted by equally trusted members.

Likes and Reputation System

  1. Likes can be given by any participant registered at least seven days with a valid rating after round completion.
  2. Code evaluation: readability, clarity, presence, and comprehensiveness of comments.

Hidden Pool Mechanics

flowchart LR
  A[Participant] --> B[Competition]
  B --> C{Data Collection}
  C --> D[Trusted Reports]
  C --> E[Likes/Dislikes]
  C --> F[Telemetry]
  D & E & F --> G[Bayesian Model]
  G --> H{TF ≥ 0.5?}
  H -- Да --> I[Hidden Pool]
  H -- Нет --> J[Main Pool]
  I --> K[Visualization (visible only to cheater)]
  J --> L[Real Results]
  1. Reports and historical data collected before the contest.
  2. Telemetry gathered during the contest: coding time, submission frequency, page navigation.
  3. Post-contest Bayesian probabilistic model makes the final decision regarding pool assignment.

UI/UX for "Hidden" vs. "Honest" Participants

  • Honest participants see their usual rankings without awareness of hidden "attributes" participants.

  • Hidden participants see fake ratings and rankings in the general leaderboard.

  • Moderators have access to dynamic trust-factor graphs, detailed telemetry logs and suspect profiles.

Automated Trust Factor Calculation

The trust factor increases or decreases after each contest based on reputation over a specified interval or is adjusted following manual moderator review.

What are your thoughts on this idea? Share your comments below.

Full text and comments »

  • Vote: I like it
  • +201
  • Vote: I do not like it

By Sunb1m, history, 13 months ago, translation, In English

Theoretical basis

If you have been solving Div1-Div2 problems on Codeforces for a long time, you have probably heard about segment tree. It may sound scary and confusing to some people, but in fact it is a very useful data structure. To put it simply, a segment tree — is a binary tree whose nodes store aggregated data about some segment of an array. Due to this tree structure we can quickly respond to queries about arbitrary array segments, and not only respond, but also update elements.

Imagine that we have an array and we need to repeatedly learn something about the array's sub segments, for example, the sum of elements, minimum, maximum, number of some values, etc. A segment tree allows us to do such queries very quickly — usually in O(log n) per query instead of a trivial O(n) enumeration. At the same time, it is quite flexible, supports changing individual elements, and with additional tricks (lazy propagations) can even mass update entire ranges in the same logarithmic time. In general, a segment tree can optimize a program algorithm quite well when conventional arrays and sorts are no longer sufficient.

Example of a segment tree constructed for the array arr = {1, 3, 5, 7, 9, 11}. Each node stores the sum on the corresponding segment: the root contains 36 (the sum on the whole segment [0;5]), its children are the sums on [0;2] and [3;5] (9 and 27), and so on.

How does it work?

We recursively split the array in half until we get to the individual elements. These elements are — leaves of the tree. Then we start “gluing” the information from bottom to top: each internal node is calculated from the two children. For example, if we need sums, then the value of the node = sum of the left son + sum of the right son. If we need minima — we take the minimum of the two children, etc. As a result, the root stores the aggregate (sum/min/max, etc.) over the entire array, and the path from the root to the leaf corresponds to dividing the segment into sub segments.

When is it necessary?

A segment tree can be used when: 1. You need to efficiently answer queries about subsets of array elements (sum on a segment, maximum on a segment, GCD on a segment, etc.) and individual elements or entire ranges may change between queries, i.e. the data is dynamic. In this case, simple pre-calculation of all answers will not help you 2. There are many queries and updates (tens and hundreds of thousands) and doing them in linear time is too slow

Segment tree provides (log n) per query or update, which is usually within the limits for n and number of queries of about 10^5. At the same time, segment tree memory of ~4*n elements for an array of size n — is also quite ok.

Examples of problems solved by a segment tree

Range sums, finding the minimum/maximum on a segment, the number of certain elements on a segment, checking the ordering of a segment, even finding the k-th element. For example, with the help of a segment tree you can easily find out how many zeros there are in a given range or find the index of the k-th zero in an array. By the way, this is exactly the task we are about to consider in practice.

Practical part

Let's take the problem 2075F - Beautiful Sequence Returns from the recent contest Educational Codeforces Round 176 (Rated for Div. 2) as an example, solving it using a segment tree and drawing an analogy in comparison with a head-on solution.

Let's consider the solution to this problem from wishgoodluck311252841. The idea of this algorithm is based on an “event-driven approach”. We precompute two arrays (let's call them l and r) that capture important edges in the original sequence. These arrays help to determine for which segments there is a “change” (an increase or decrease in the contribution to the final answer). Then an event vector is formed — each element of this vector specifies that some value should be added or removed at a certain segment. Finally, the final length of the desired sequence is obtained as the maximum total change.

The key challenge is to quickly and efficiently handle the multitude of events that affect the index ranges. This is exactly what a segment tree is used for.

In this solution, the segment tree is used for two operations:

  • Range Update. Each event from vector v specifies that either +1 or -1 should be added on the segment [l,r]. A segment tree with lazy updates allows for O(log n) per event to update the entire segment at once, without going through each element separately.
  • Range Query. After each update we need to quickly find out what is the maximum value on the whole range. The root of the tree (Tree[1].v) always stores the maximum among all elements. This allows us to know instantly (in O(1) after O(log n) updates) the current optimal value, which is then used to compute the answer.

Thus, the segment tree here acts as an “aggregator” of all events, accumulating their influence on the given segments and allowing us to quickly find the maximum.

Imagine a problem where we need to update all elements of a segment at each step for each event. If the range has length m, then the naive update takes O(m) operations. Given m∼n and O(n) events, we get O(n^2) operations. This solution will quickly become time unacceptable even for medium-sized input data.

In contrast, the segmental tree allows whole ranges to be updated in O(log n) due to lazy propagation. Thus, even if there are many events, the overall complexity becomes O(n*log n).

Pitfalls and tips

Segment tree — thing is powerful, but you can make some typical mistakes when implementing it:

  • Incorrect merging of results. Forget that you need to summarize children, or confuse left/right and the whole logic will be destroyed. Carefully define what the node stores and how it is derived from the children's information.
  • Segment Boundaries. It's easy to make mistakes with the indices left, right, and middle. Assume that the segment [left..right] is inclusive. Then middle = (left+right)/2 — left part [left..middle], right part [middle+1..right]. If you forget +1 where you need it, you will get overlapping or missing indexes. Tip: debug on a small array, for example, n=4 and write out the segments of each node on a piece of paper, check that they cover the array without gaps and overlaps.
  • Recursion vs iteration. We considered a recursive solution for ease of understanding, but when n ~ 10^5 the depth of recursion is ~17 is ok, but if n ~ 10^7, the depth is ~24, it's still ok. When n is very large, and especially when the number of operations is large, iterative implementations are sometimes preferred to avoid overhead and risk of stack overflow
  • Lazy updates. In our example we just used them, updating a whole range at once. Note that implementing lazy tags is more complicated — it's easy to mess with propagating changes to children. If the task requires a segmenter with lazy, test on simple examples hard. Typical mistakes: forgot to run a lazy tag before further descent or before counting in a node — you will get incorrect data. Tip: write a separate push(v) function to propagate the tag from node v to children, and call it where needed, e.g. before moving to subtrees.
  • Choosing between Fenwick (BIT) and segment tree. They are often interchangeable for simple problems on sums, ordinal statistics. Fenwick is easier to code and requires memory, but the segment tree is more versatile, supporting complex merge operations. If the problem allows Fenwick and you don't want to spend time implementing the tree — go ahead with Fenwick, but segment tree — must know for Div1-Div2 level.

Finally, I'll give you a little hint — if the problem simply requires multiple sum queries on a segment without updates, you don't need a segment tree, it will be enough to calculate prefix sums for O(n) in advance and then answer each query for O(1).

Thanks to those who made it to the end, have a great programming :)

Suggest in the comments what else I should write about in my next post and what topics are still worth exploring.

Full text and comments »

  • Vote: I like it
  • +2
  • Vote: I do not like it

By Sunb1m, history, 13 months ago, translation, In English

Anecdote 1^n

In a Codeforces competition, a contestant submits every problem on the first try without even reading the terms and conditions. He is asked:

  • How do you do that?! You don't even read the terms!

He replies:

  • I haven't read the terms for a long time, I just send them to ChatGPT.

  • And if GPT makes a mistake?

  • Well, then there will be two guilty parties: me and the neural network.

Anecdote sqrt(4)

A sick person comes to the doctor after participating in a round of Codeforces.

Doctor:

  • Hello, what are you complaining about?

The patient is silent and typing something quickly on his phone.

Doctor (repeats surprised):

  • Young man, so what are you complaining about?

Participant (reading from the phone):

  • Hello, doctor. I have increased response latency, difficulties with cognitive function, and periodically need to clarify details of input data....

The doctor is perplexed:

  • Can you explain in a human way?

The participant types again and reads out:

  • I'm sorry, I didn't quite understand you, could you rephrase your question?

Physician:

  • Can't you speak without that ChatGPT at all!

Participant looks at the phone and reads:

  • Unfortunately, I am a language model and cannot visit doctors in person.

Physician:

  • I see, you have a serious case....

He writes in the diagnosis: “Chronic neural network addiction. “Recommend immediate lockdown of your Codeforces account.

Task B. “GPT Helper”

Time limit: 1 second

Memory limit: 256 megabytes

Input: standard input

Output: standard output

Vasya actively uses ChatGPT during Codeforces competitions and it has become his habit. Before each task he makes sure to send its conditions to the neural network. If ChatGPT fails, Vasya sends a new request: “What to do if ChatGPT doesn't know the solution to the problem?”. Usually after that the neural network gives him advice on how to find a solution on his own.

However, today Vasya was banned and he can no longer participate in contests. Vasya turned to ChatGPT for help again:

Vasya: “I got banned on Codeforces for using AI, what should I do?”.
ChatGPT: “I can't help you with this problem, please contact Codeforces administration.”
Vasya felt sad that for the first time AI didn't help him with a task.

Task: Determine how many tasks Vasya had time to submit if we know that the neural network refused to help him exactly after he submitted his solution to the fifth task of the contest.

Input data

There are no input data.

Output data

Print a single integer — the number of the task on which Vasya was let down by ChatGPT.

Example

Input data

Output data

5

Note

Please do not use neural networks during the competition!

Full text and comments »

  • Vote: I like it
  • 0
  • Vote: I do not like it

By Sunb1m, history, 13 months ago, In English

Hello dear users and MikeMirzayanov!

Lately I've often come across discussions about the use of AI in competitions. I decided to share my thoughts and personal experience on this issue, as well as make some suggestions that could help to fight the problem of foul play more effectively.

As an experiment, I tested several ChatGPT models on a paid basis to see how well they are able to efficiently solve contests on different algorithms in terms of correct solutions and optimization.

  • o3-mini copes well with div2 A-C level problems, but starting from div2 D the model without an explicitly defined solution idea often generates suboptimal code, incorrectly defining the optimal asymptotics of the algorithm. Problems are especially noticeable in dynamic programming problems when composing the recurrence formula and in interactive problems.
  • o3-mini-high shows better understanding of the problems, deeper reasoning, but also makes mistakes on complex div2 E-F problems, especially those related to dynamic programming. Often, to get the correct and optimal solution, we have to manually describe the model algorithm with the exact formula and asymptotics.

In my experience, of course, for purely experimental and scientific purposes, I noticed one interesting thing — ChatGPT in standard conditions without detailed explanations of the algorithm usually generates the same code pattern, containing approximately the same structure of functions and their names, similar style of variables and repeated comments, frequent use of the same blanks and templates, especially for typical algorithms. Yes, of course an ordinary user can write a similar algorithm, but there will be a clear difference from AI in code layout and code structure.

This, in my opinion, can be used as one of the layers of protection for automatic moderation of solutions on the platform. Already now, moderation on Codeforces sometimes successfully detects suspiciously similar solutions on this very basis. But it is actually very easy to circumvent this now by obfuscating the code, adding “garbage” and unnecessary functions, because of which the code will be difficult to read in the end. You can think of several more layers of protection that will work in parallel with the main one.

The basic idea is behavioral analysis of user actions in real-time. The following functionality can be implemented:

  1. Real-time analysis of the difference between the moment of opening the task condition and the first sending of the solution
  2. Tracking sudden changes in the speed of problem solving, for example, solving a complex problem too quickly after a long idle time without activity.
  3. Fixing absolutely all user actions on the page of the contest with batch sending of encrypted data to the server for the anti-fraud module, without which further action on the site will be impossible, we will take into account especially carefully the events of copying the task condition from the page.

A separate and key solution will be the implementation of a client application Client <-> Server. Its main task will be monitoring of suspicious actions. In some ways its logic will be similar to the work of a proctor when conducting, for example, an online screener.

  1. Analyzing processes running during the contest
  2. Analyzing incoming/outgoing traffic to control requests to AI services
  3. Creating and sending random screenshots of the user's screen to the server only when there is suspicious activity

Of course, from a privacy point of view, this solution would not be a good one, so it's worth thinking more about how to implement it properly.

Write your thoughts and ideas in comments, it will be interesting to hear!

Full text and comments »

  • Vote: I like it
  • -22
  • Vote: I do not like it