I'm trying to create a class which acts like a dictionary whose keys are tuples, but I don't want them to be "truly" tuples, because I'll use this dictionary to create Pandas dataframes, and Pandas assume that tuples as keys mean a multi-index (which is not correct in this case).
In the case of tuples of a single element, this produces a bug, as for example:
>>> a = {(1,): 1 }
>>> pd.Series(a)
1 NaN
dtype: float64
What happens is that Pandas sees that the key of the dictionary is a tuple, so it assumes a multi-index. Then, it sees that the len
of the tuple is 1, so it decides to create a plain index after all. But if fails to store the value, because the dictionary has not the key 1
, but the key (1,)
instead, hence the NaN
.
Leaving apart this bug, using "normal" tuples with several elements, Pandas works fine, but assumes a multi-level index which I don't want:
>>> a = {(1,2): 1 }
>>>> pd.Series(a)
1 2 1
dtype: int64
What I want instead is to use as index the tuple (1,2)
.
I decided to implement my own Tuple
class, like this (imitating the implementation of UserList
in collections
standard library, but keeping it to a minimum):
from collections.abc import Sequence
class Tuple(Sequence):
def __init__(self, initlist=None):
self.data = ()
if initlist is not None:
if type(initlist) == type(self.data):
self.data = initlist
elif isinstance(initlist, Tuple):
self.data = initlist.data
else:
self.data = tuple(initlist)
def __getitem__(self, i): return self.data[i]
def __len__(self): return len(self.data)
def __hash__(self): return hash(self.data)
def __repr__(self): return repr(self.data)
Sequence.register(Tuple)
If I use this kind of object as keys in my dictionary, Pandas is forced to use the object as index, which stops it to generate a multi-index:
>>> a = {Tuple((1,2)): 1}
>>> pd.Series(a)
(1, 2) 1
dtype: int64
The dictionary looks as if the keys were tuples:
>>> a
{(1, 2): 1}
So far, so good. However, something strange happens:
>>> a[Tuple((1,2))]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-169-9641d6999f03> in <module>()
----> 1 a[Tuple((1,2))]
KeyError: (1, 2)
Why is this? As far as I understand, python dictionaries should locate the value by computing the hash of the given key, which my Tuple.__hash__()
does consistently, by hashing its inner data
. Then, why the key is not found?
I guess that I must implement some other method in my Tuple class, but I cannot see which one, or why.
You also need to implement either __eq__
or __cmp__
for being hashable
:
An object is hashable if it has a hash value which never changes during its lifetime (it needs a
__hash__(
) method), and can be compared to other objects (it needs an__eq__()
or__cmp__()
method). Hashable objects which compare equal must have the same hash value.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.