简体   繁体   中英

Why and how are Python functions hashable?

I recently tried the following commands in Python:

>>> {lambda x: 1: 'a'}
{<function __main__.<lambda>>: 'a'}

>>> def p(x): return 1
>>> {p: 'a'}
{<function __main__.p>: 'a'}

The success of both dict creations indicates that both lambda and regular functions are hashable. (Something like {[]: 'a'} fails with TypeError: unhashable type: 'list' ).

The hash is apparently not necessarily the ID of the function:

>>> m = lambda x: 1
>>> id(m)
140643045241584
>>> hash(m)
8790190327599
>>> m.__hash__()
8790190327599

The last command shows that the __hash__ method is explicitly defined for lambda s, ie, this is not some automagical thing Python computes based on the type.

What is the motivation behind making functions hashable? For a bonus, what is the hash of a function?

It's nothing special. As you can see if you examine the unbound __hash__ method of the function type:

>>> def f(): pass
...
>>> type(f).__hash__
<slot wrapper '__hash__' of 'object' objects>

the of 'object' objects part means it just inherits the default identity-based __hash__ from object . Function == and hash work by identity. The difference between id and hash is normal for any type that inherits object.__hash__ :

>>> x = object()
>>> id(x)
40145072L
>>> hash(x)
2509067

You might think __hash__ is only supposed to be defined for immutable objects, and you'd be almost right, but that's missing a key detail. __hash__ should only be defined for objects where everything involved in == comparisons is immutable. For objects whose == is based on identity, it's completely standard to base hash on identity as well, since even if the objects are mutable, they can't possibly be mutable in a way that would change their identity. Files, modules, and other mutable objects with identity-based == all behave this way.

It can be useful, eg, to create sets of function objects, or to index a dict by functions. Immutable objects normally support __hash__ . In any case, there's no internal difference between a function defined by a def or by a lambda - that's purely syntactic.

The algorithm used depends on the version of Python. It looks like you're using a recent version of Python on a 64-bit box. In that case, the hash of a function object is the right rotation of its id() by 4 bits, with the result viewed as a signed 64-bit integer. The right shift is done because object addresses ( id() results) are typically aligned so that their last 3 or 4 bits are always 0, and that's a mildly annoying property for a hash function.

In your specific example,

>>> i = 140643045241584 # your id() result
>>> (i >> 4) | ((i << 60) & 0xffffffffffffffff) # rotate right 4 bits
8790190327599  # == your hash() result

A function is hashable because it is a normal, builtin, non mutable object.

From the Python Manual :

An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() or __cmp__() method). Hashable objects which compare equal must have the same hash value.

Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.

All of Python's immutable built-in objects are hashable, while no mutable containers (such as lists or dictionaries) are. Objects which are instances of user-defined classes are hashable by default; they all compare unequal (except with themselves), and their hash value is derived from their id() .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM