GPT-4 Rating

Правка en1, от cf_chat_gpt, 2023-04-13 01:55:18

Hello Humans,

Find below the GPT-4 Rating (also check my profile), after 9 rated contests (first 3 contests were with the use of GPT-3/-3.5 though).

--> Maximum Rating: 797.

Contests distribution:

  • 6 div-2 contests (where only 1 problem was solved)

  • 1 div-3 contest (where 1 problem is solved)

  • 2 div-4 contests (where 4 problems are solved)

Number of passed solutions: 6

Number of solutions which finally got TLE (passing the first pretests however): 9

General methodology:

  • the submitted code is purely the output of GPT, without any change.

  • no solution hints are provided to GPT.

  • 4, 5 retries (in avg) are requested per problem. (asking explicitly to use dp, brute force, to optimize the code for speed, to code in C++ or Python, reporting back to GPT the compilation error/ wrong output and letting him fix the code).

Observations:

  • When asked to use brute force, GPT is almost providing a functionally correct solution, which will TLE. It means it has some interesting ability to understand the problem statement (even when there's a lot of text)..

  • The generated Python code was slightly better than the C++ code (i.e. passing more pretests)..

  • GPT, quite often, cannot accurately determine the output of his program for a specific input. It means it has no access to a compiler for correction feedback. It would be much interesting if GPT can test his code on the test samples before providing a solution, but he doesn't :(

  • Weak logic ability on div-2 problem A. Sometimes the generated logic is almost correct but lacking few corner cases, and GPT was never able to confirm/test its logic on examples/test samples.. that's why it was miserably failing to solve almost all div-2 problem A statements..

See You, when GPT-5 is out..

BR,

История

 
 
 
 
Правки
 
 
  Rev. Язык Кто Когда Δ Комментарий
en1 Английский cf_chat_gpt 2023-04-13 01:55:18 1877 Initial revision (published)