Discussion on a bug of mojibake(especially regarding Chinese) in the Ghosts uploading in gym

Revision en5, by wmxwmx, 2021-11-16 13:25:12

Discussion on a bug of mojibake(especially regarding Chinese) in the Ghosts uploading in gym

Hello guys, I am the contest uploader of 2021 Jiangxi Provincial Collegiate Programming Contest and by the time I uploaded it, I encountered a serious bug of mojibake when uploading Ghosts containing Chinese characters. As it is at the middle of Chinese algorithm contests season, I find it necessary to discuss this bug in public as well as to provide a temporary solution to this bug for Chinese uploader or anyone who suffering from this bug.

Discovery of the bug

In October $$$25^{th}$$$, after I upload the Ghosts for the 2021 Jiangxi Provincial Collegiate Programming Contest through FTP servers, all the Chinese characters in the team name became mojibake as the screenshot below.

As I was convinced that gym supports the unicode display of team name in Ghosts, I consulted another uploader who have successfully uploaded team name with Chinese characters only to find the bug didn't show up before until recents. Then I tried different methods to upload including paste the content right into the Textboard or use other unicode format and they all failed.

However, I accidentally make it right by turning off the proxy server on my computer as someone suggested that I may have encounter some bug during transmitting the data(But it actually doesn't make any sense since the size of the file have never changed as I checked in ftp). And the most weird part of it is that I tried multiple times afterwards with both FTP and Paste with proxy servers off and it all works! So I stuck with the explanation of proxy server.

Reappearance of the bug

In November $$$1^{th}$$$, when 2021年中国大学生程序设计竞赛女生专场 (China Collegiate Programming Contest for Girls) was uploaded in gym, the same problem appears again.

By that time I was just thinking that they may have just encountered the same bug during data transmitting. This thought is invalidated when The 2021 CCPC Guilin Onsite (Grand Prix of EDG) was encountering the same bug when uploading. Apparently it wasn't that simple as I thought it was, since both the problem uploaders, chenjb and Claris are the two of the most prestigious contestants in China. They are definately not the first time to deal with Ghosts uploading comparing to a newbie as myself. So I try to fix this issue with my same old solution and it fails this time.

Temporary solution

As the problem may more likely to be misdecoding, I started to find the decoding method it use to decode Chinese in unicode. The result is ISO 8859-2, which is an encoding method used in Central Europe. But the most anti-instinct part is that it does not support Russian characters, which means that Russian may be encoded with Unicode. As I scanned though the contests in gym and re-uploaded a Ghosts file of Russian contest to my test contest, my conjecture is proved.

So the temporary solution for now is to add some (less than 30 is enough) unreal contestants at the bottom of contestant list with their name in Russian. As they have no submission history, they will not show up in the standings page. But they do help the Codeforces to recognize unicode. The only problem is that if you check the contest carefully, you will find the real number of Ghosts does not match the standing.

Written in the end

Now the standing of those contests are fixed without fixing the bug. While fixing this issue, there is another bug/feature which is rather annoying that each time when I delete the Ghosts, the overall counts of passing the problems remains.

I am more than grateful to MikeMirzayanov for this amazing platform which indeed helps me a lot. Hope the bug will be fixed soon and Codeforces gets better in the future!

Tags gym, bug, chinese, unicode, mojibake

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en5 English wmxwmx 2021-11-16 13:25:12 1 Tiny change: 'turning of the prox' -> 'turning off the prox'
en4 English wmxwmx 2021-11-16 11:25:47 4
en3 English wmxwmx 2021-11-16 10:14:12 123 (published)
en2 English wmxwmx 2021-11-16 10:01:05 1682 Tiny change: 'm/103366) and by th' -> 'm/103366) [contest:103366] and by th'
en1 English wmxwmx 2021-11-16 08:39:19 2817 Initial revision (saved to drafts)