Блог пользователя Arpa

Автор Arpa, 9 лет назад, По-английски

Hi!

After my previous post about unordered_map now I want to explain hash functions.

std::hash.

C++ STL has one hash function in library <functional>. You can use it for this data types:

template<> struct hash<bool>;
template<> struct hash<char>;
template<> struct hash<signed char>;
template<> struct hash<unsigned char>;
template<> struct hash<char16_t>;
template<> struct hash<char32_t>;
template<> struct hash<wchar_t>;
template<> struct hash<short>;
template<> struct hash<unsigned short>;
template<> struct hash<int>;
template<> struct hash<unsigned int>;
template<> struct hash<long>;
template<> struct hash<long long>;
template<> struct hash<unsigned long>;
template<> struct hash<unsigned long long>;
template<> struct hash<float>;
template<> struct hash<double>;
template<> struct hash<long double>;

For example:

hash<int>hi;
int x=69;
cout<<hf(x)<<endl;
hash<long double>hld;
long double y=69.6969696969;
cout<<hld(y)<<endl;
hash<vector<int> >hv;
vector<int>v({69,69,69,69});//v→ 69,69,69,69
cout<<hv(v)<<endl;
Hand-made hash functions.

There are many ways to create hash function for your struct. For example:

struct reval{
  vector<int>v;
  int n;
  string s;
  size_t hash(){//size_t is alias of unsigned int
    hash<string>hs;
    hash<long long>hll;
    hash<int>hi;
    hash<vector>hv;
    long long ans=hs(s);
    ans<<=32;
    ans|=hi(n);
    ans=hll(ans);
    ans<<=32;
    ans|=hv(v);
    ans=hll(ans);
    return ans;
  }
}

A simple example:(be careful;it is only a sample and it isn't good hash function of course)

struct reval{
  vector<int>v;
  int n;
  string s;
  size_t hash(){//size_t is alias of unsigned int
    hash<string>hs;
    hash<int>hi;
    hash<vector>hv;
    return hs(s)+hv(v)+hi(n);
  }
}

A trick:

struct S{
  string first_name;
  string last_name;
};
namespace std{
    template<>struct hash<S>{
        size_t operator()(S const& s) const{
            size_t h1=hash<string>()(s.first_name);
            size_t h1=hash<string>()(s.last_name);
            return hash<long long>()( (h1<<32)^h2 );
        }
    };
}
int main(){
    S s;
    s.first_name="MohammadSina";
    s.last_name="PakSeresht";
    cout <<"hash(s) = "<<hash<S>()(s)<<endl;
}

Note1: You can use hash<type>()(variable), like this:

int x=69;
cout<<hash<int>()(x)<<endl;

Instead of:

int x=69;
hash<int>hasher;
cout<<hasher(x)<<endl;

Note2: please be careful about combining two hashes. There are two good ways:

1-x*P+y mod M(when P is prime number (like 43) and M is less than 2^32 and not a power of 2 (like 10^9+7)).

2-hash<long long>()((x<<32)^y) or hash<long long>()(x*1000000007LL+y).

  • Проголосовать: нравится
  • +5
  • Проголосовать: не нравится

»
9 лет назад, # |
  Проголосовать: нравится +17 Проголосовать: не нравится

Both x·232 + y and x + y are bad ways to combine hashes.

If you use x·232 + y, the (32-bit) hashes of (x, y) pairs such as (1, 1), (2, 1), ..., (k, 1) will all be equal. As already discussed in your previous post.

If you use x + y, the hashes of (x, y) pairs such as (1, 109), (2, 109 - 1), ..., (109, 1) will all be equal.

Both are rather common cases which can, and will, make your program slow.

For programming contests, when you combine hashes of x and y, some kind of polynomial hashing like for relatively prime P and Q is usually sufficient. Still, take care when making Q a power of two.

  • »
    »
    9 лет назад, # ^ |
    Rev. 2   Проголосовать: нравится -9 Проголосовать: не нравится

    I wrote them in that way for simplify.

    Another way is to use hash<long long>()( (x<<32)^y ).

    +post has edited.

    • »
      »
      »
      9 лет назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      Sorry, but some examples are so particularly bad that they are worse than having no example at all.

      The hash<long long>() of some mix of x and y approach looks like it might actually be a good one. Still, it surely behaves differently in 32-bit and 64-bit mode, even when there is no undefined behaviour in the mix itself. So, it might be difficult to debug when you compile to 64-bit and the judges test a 32-bit version.

»
9 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

template<> struct hash;

o0 That looks rather dangerous :P.

  • »
    »
    9 лет назад, # ^ |
      Проголосовать: нравится +1 Проголосовать: не нравится

    Can you explain it more please?

    • »
      »
      »
      9 лет назад, # ^ |
        Проголосовать: нравится +3 Проголосовать: не нравится

      Zonk. < and > are some HTML thingies and I wrote

      <double>
      

      but it disappeared xd. From that it is clear, why you wanted me to explain :D.

      By previous post I meant that we should never expect doubles to be equal (even if they were initialized by the same hardcoded formula). Hence, we should never expect hashes of doubles to be equal, so it is probably useless in that case. Am I right?

»
9 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

define 69 49

»
5 лет назад, # |
Rev. 5   Проголосовать: нравится +6 Проголосовать: не нравится
C++ STL has one hash function in library <functional>. You can use it for this data types:

template<> struct hash<bool>;
template<> struct hash<char>;
template<> struct hash<signed char>;
template<> struct hash<unsigned char>;
template<> struct hash<char16_t>;
template<> struct hash<char32_t>;
template<> struct hash<wchar_t>;
template<> struct hash<short>;
template<> struct hash<unsigned short>;
template<> struct hash<int>;
template<> struct hash<unsigned int>;
template<> struct hash<long>;
template<> struct hash<long long>;
template<> struct hash<unsigned long>;
template<> struct hash<unsigned long long>;
template<> struct hash<float>;
template<> struct hash<double>;
template<> struct hash<long double>;

But you yourself showed hashing of vector and string using hash<vector<int>> and hash<string>.

Edit:

I found out that more template specializations of hash class exist in the namespaces <string>, <memory>, <vector>, <bitset>, <system_error>, <type_index> and <thread> as well. So hash<vector<int>> and hash<string> are defined in the <vector> and <string> namespaces respectively, not in <functional> namespace.

»
5 лет назад, # |
Rev. 8138   Проголосовать: нравится -26 Проголосовать: не нравится

It seems like you are watching POR**UB(because you wrote 69 too many times).

»
5 лет назад, # |
Rev. 3   Проголосовать: нравится -12 Проголосовать: не нравится

 Seems like it is not working.

  • »
    »
    5 лет назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    Hash doesn't directly support vector of any data type other than bool. If it does then unordered_map would be supporting a vector as a key (which unfortunately doesn't).

  • »
    »
    5 лет назад, # ^ |
    Rev. 3   Проголосовать: нравится +1 Проголосовать: не нравится

    This is why it is important to read documentation on "something" before jumping in to use that "something". Your code is invalid, not that you can't use hash for vector.

    It should look something like this: cout << hash <vector <bool>>()(vector <bool>(10, false));

    Firstly, hash does not take a vector in constructor. So, you need to declare a temporary object and then pass a to use the hash operator()(vector<bool>) overload.

    Secondly, I get errors for trying to use vector <int> instead. The error:

    Error

    This results into two cases: either the author gave a wrong example or I haven't read more documentation myself. The latter seems to be the issue according to me as I've not read much myself. But, the former may be the case too. Can't tell much unless I read further.