Me and DeepSeek spent a weekend analyzing 8000 CF problems. Here's what we found

Revision en1, by ayushgirigoswami15, 2026-05-15 20:28:28

I analyzed 1600 Codeforces contests and 8000+ problems — some interesting patterns

please check this result : https://ayushgirigoswami.github.io/codeforces_analysis_report/

A few weeks ago I got curious about something most of us probably notice intuitively but rarely measure properly:

  • Is Codeforces getting harder over time?
  • Which topics are becoming more common?
  • Are some problems overrated or underrated?
  • How different are Educational rounds from regular rounds?

So I wrote a Python script that pulls contest/problem data from the CF API, analyzes ratings + tags, and generates interactive visualizations.

Report

https://ayushgirigoswami.github.io/codeforces_analysis_report/

Source code

https://github.com/Ayushgirigoswami


Dataset

The analysis includes:

  • 1600 contests
  • 8354 rated problems
  • Div.1 / Div.2 / Div.3 / Div.4 / Div.1+2
  • 2011 → present

Average problem rating across the dataset: 1793


Some interesting results

1) Div.1+Div.2 rounds have the widest difficulty spread

Division Avg Rating Median Std Dev
Div.1 2358 2400 714
Div.2 1630 1600 652
Div.1+2 2096 2100 927
Div.3 1429 1400 513
Div.4 1213 1100 412

Div.1+2 rounds have by far the largest standard deviation.

Makes sense in hindsight: these rounds combine easy entry problems with very high-end G/H problems, so the spread becomes huge.

Also interesting: even Div.1 A problems average around 1537, which is already harder than many Div.2 mid-problems.


2) Biggest difficulty jumps are usually B→C and C→D

Average Div.2 ratings by position:

  • A ≈ 903
  • B ≈ 1203
  • C ≈ 1552
  • D ≈ 1932
  • E ≈ 2300
  • F ≈ 2614

The largest jumps are:

  • B → C : +349
  • C → D : +380

This matches what many contestants experience during contests: B is often straightforward, while C/D is where actual problem solving starts becoming important.

For Div.1+2 rounds, the jump near the end becomes even more extreme:

  • F ≈ 2697
  • G ≈ 3102
  • H ≈ 3160

3) Topic trends over time

Increasing frequency

  • greedy
  • math
  • constructive algorithms
  • data structures
  • binary search
  • dp
  • trees
  • bitmasks
  • interactive

Decreasing frequency

  • implementation
  • geometry

The increase in interactive problems during the last few years was especially noticeable.

Geometry also appears much less frequently than older rounds.


4) Most common tags

Tag Total
greedy 2885
math 2805
implementation 2407
dp 1980
constructive algorithms 1677
brute force 1644
data structures 1620
binary search 1022

Some observations:

  • DP is disproportionately common in Div.1.
  • Implementation dominates Div.2/3 but drops heavily in Div.1.
  • Graph-related problems appear much more frequently than I expected.

5) Educational rounds vs regular rounds

This part surprised me.

Type Overall Avg A B C D E F
Educational 1769 873 1118 1465 1842 2225 2628
Regular 1767 1050 1344 1714 2088 2417 2525

Overall average difficulty is almost identical.

But position-by-position: Educational rounds are consistently easier from A→E, while F problems are actually harder on average.


6) Problems whose ratings seem unusual

Examples of problems that appear easier/harder than their ratings suggest based on solve counts.

Easier than expected

  • 1264F — Beautiful Fibonacci Problem (rated 3500, but solved by 1000+ users)

Harder than expected

  • 2190F — Xor Product
  • 2066F — Curse
  • 1967F — Next and Prev
  • 949F — Astronomy

These had surprisingly low solve counts relative to their ratings.


7) Contest “symmetry”

I also tried measuring how balanced contest difficulty curves are.

Average symmetry score: 0.454 / 1

Interpretation:

  • 0.7 → balanced progression
  • < 0.4 → heavily front-loaded

Most CF contests lean slightly front-loaded: easy opening problems followed by a sharp wall.


Running the script

Requirements:

pip install requests pandas numpy plotly matplotlib tqdm

Run:

python deep.py

The script:

  1. Fetches contests/problems from the CF API
  2. Performs statistical analysis
  3. Generates interactive Plotly graphs
  4. Detects trends/anomalies

Fetching everything takes around 15–20 minutes because of API rate limiting.


One thing I still want to analyze is solve timing during contests (for example: when most users solve C/D problems), but that would require a much larger amount of contest.status API calls.

If anyone has ideas for additional analyses, suggestions are welcome :)

Tags dynamic programming, greedy, icpc challenge, rating

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en2 English ayushgirigoswami15 2026-05-15 20:29:09 16 Tiny change: '# I analyzed ' -> '# Me and DeepSeek analyzed '
en1 English ayushgirigoswami15 2026-05-15 20:28:28 5347 Initial revision (published)