Diego's blog

By Diego, history, 2 years ago, translation, In English

I received the following message today:

I only received the Russian version, the translation is below:

begin of translation

Warning.

Your solution 187800326 for Problem 1779D is significantly similar to the solutions of other participants and is in the group of the same solutions of tatyam/187759083, TranGiaHuy/187764159, StaneLegend/187768377, Kite_kuma/187775255, t4m0fey/187777461, foreverlastnig1202/187782300, x5f3759df/187783962, HollwoQ_Pelw/187784007, dimss/187785321, matuy/187787723, xinyster/187791090, Diego/187800326, siganai/187800341, Mr_yqz/187803999, Coki628/187805330, IsCarryNotKarry/187807688, AlphaGaurav/187808726, Muhammad-Zendy/187812255, LeVanThuc/187817663, aadrito/187820244, nidaboge/187821068, Husanboy/187822585, 160cm/187823271, Peachrain/187827607, LilyYSC/187829324, purupuddu/187829779, alex09/187830914. Such a match is a clear violation of the rules. Note that unintentional leakage is also a violation. For example, you should not use ideone.com with the default settings (public access to your code). If you have indisputable evidence that the match was due to using a common source published before the contest, write a comment on the roundup post with all the details. Read more at http://mirror.codeforces.com/blog/entry/8790. Such violation of the rules may be grounds for blocking your account or other penalties. In case of repeated violations, your account may be blocked.

end of translation

I just looked at most of the solutions mentioned. None of them use DSU like mine. Perhaps the overlap is due to my template, which contains SegTree in particular and some other useful features. However, this template was posted by me on github about a year ago: https://github.com/eropsergeev/cf-tempalte and the use of some parts from it by others is legal. Also, I was not able to recognize parts of my template in the solutions presented. Perhaps the system reacted to the large number of overlapping names like segtree?

In any case, the accusation of cheating is clearly unfounded.

MikeMirzayanov, can you please comment on this?

Full text and comments »

  • Vote: I like it
  • +48
  • Vote: I do not like it

By Diego, 3 years ago, translation, In English

TL;DR

If you code on python, use set or dict, and want to avoid hacks, you can read only the last section.

Introduction

Most people interested in hacking know about the method of creating tests targeting unordered containers in C++. This method is described in more detail in post. However, creation of similar tests for set and dict in python is covered much less well, which I decided to fix with this post.

A little theory about hash tables

I will not go into detailed descriptions and proofs, because there are plenty of them in the open sources. Let us consider only what interests us in this case.

An open-addressing hash table is a data structure that stores a set of elements in an array of knownly larger size, defining the position of the element as its hash taken modulo the size of the array. If hashes of several elements give the same cell in the array, the hash table tries to take another cell according to some rule, which depends on the particular implementation of the table. Usually this rule boils down to checking the cells $$$f(x), f(f(x)), f(f(f(x)))$$$ and so on, until an empty one is found. The function $$$f$$$ is often a linear transformation $$$(a * x + b) \% size$$$, where $$$size$$$ is the size of the array, and $$$a$$$ and $$$b$$$ are relatively prime with it.

Python

Python uses a hash table with an array size equal to a power of two to implement dict, and the transformation is slightly more complex than a simple linear one -- $$$f(x) = (5x + 1 + pertrub) \% size$$$, where $$$pertrub$$$ is initially equal to hash, but is devided by 32 in each step. For a detailed implementation, see repository.

Also, the hash function for numbers in python is very predictable, it's just the number itself.

Thus, to make a countertest it is enough to find a sequence of indexes that will be searched for a particular number and preoccupy them, and then provoke a search for that number in the dictionary, which is quite easy to do for most problems.

The set implementation in Python3 is a bit more complicated, it does not test a single cell, but 10 consecutive cells, which can be observed here. However, building a test in this case is not very difficult either, just add $$$x, x + 1, x + 2, ..., x + 9$$$ instead of one number $$$x$$$.

Tests

I used 153032429 (Thanks to turkids for it) and 153408991 from Codeforces Round 781 (Div. 2) to check. The result can be found as hacks number 796358 and 796362. Judging by the "unknown verdict" result, it went too well and I broke the author's solution at the same time. More details may know shishyando and Kirill22.

How to protect yourself?

Unlike C++, Python does not provide a way to define its own hash function for an existing type (or I just don't know about it), but nobody prevents you from defining your type with a different hash function:

from random import getrandbits

RANDOM = getrandbits(32)

class Wrapper(int):
    def __init__(self, x):
        int.__init__(x)
    def __hash__(self):
        return super(Wrapper, self).__hash__() ^ RANDOM

An example of using this type can be found in 153409562. Unfortunately, while the author's solution breaks on my tests I will not be able to test the robustness of my solution with the Wrapper class. However, locally it works fast enough.

Full text and comments »

  • Vote: I like it
  • +178
  • Vote: I do not like it