在pandas系列上使用apply方法获取TypeError'Series'对象是可变的，因此不能将它们散列

Question

I have two data frames D1 and D2 . 我有两个数据帧D1和D2 。 What I want to achieve is for any column pairs in D1 and D2 which are non-int and non-float type, I want to compute a distance metric using the formula 我要实现的是对于D1和D2任何非整数和非浮点类型的列对，我想使用以下公式计算距离度量

 |A intersect B|/ |A union B|

I first defined a function 我先定义一个函数

def jaccard_d(series1, series2):
    if (series1.dtype is not (pd.np.dtype(int) or pd.np.dtype(float))) and     (series2.dtype is not (pd.np.dtype(int) or pd.np.dtype(float))):
        series1 = series1.drop_duplicates()
        series2 = series2.drop_duplicates()
        return len(set(series1).intersection(set(series2)))     /len(set(series1).union(set(series2)))
    else:
        return np.nan

Then what I did is to first loop over all columns in D1 , then for each fixed column in D1 , I use apply on my jaccard_d function. 然后，我所做的是超过所有列的第一环D1 ，然后在每个固定列D1 ，我用apply于我的jaccard_d功能。 I try to avoid writing 2 layer loops. 我尽量避免编写2层循环。 May be there is a better way without wrting any loops? 可能有一种更好的方法而不会产生任何循环？

DC = dict.fromkeys(list(D1.columns))
INN = list(D2.columns)
for col in D1:
    DC[col] = dict(zip(INN, D2.apply(jaccard_d,D1[col])))

First, I am not sure whether I use the apply function correctly, ie, my jaccard_d function takes 2 series as input, but here for each iteration, I have D1[col] as one series, and I want to use apply to apply D1[col] to all columns of D2 首先，我不确定我是否正确使用了apply函数，即我的jaccard_d函数将2个序列作为输入，但是在这里对于每次迭代，我都将D1[col]作为一个序列，并且我想使用apply来应用D1[col]到D2所有列

Second, I get this error "'Series' objects are mutable, thus they cannot be hashed", which I don't quite understand. 其次，我收到错误消息“'系列'对象是可变的，因此无法对其进行哈希处理”，我对此不太了解。 Any comments are appreciated. 任何意见表示赞赏。

I tried to just write a 2-layer loop and use my function jaccard_d to do that. 我试图编写一个2层循环，并使用我的函数jaccard_d来做到这一点。 It works. 有用。 But I want to write more efficient code. 但是我想编写更有效的代码。

Answer 1

So after floundering around, and finding exactly where the error occurs, and checking the apply docs, I've deduced that you need to call apply thusly: 因此，经过反复摸索，找出了错误的确切位置，并检查了apply docs之后，我推断出您需要这样调用apply ：

 D2.apply(jaccard_d, args=(D1[col],))

Instead you were using 相反，您正在使用

 D2.apply(jaccard_d, axis=D1[col])

================== ==================

I can reproduce your error message with a simple dataframe: 我可以使用简单的数据框重现您的错误消息：

In [589]: df=pd.DataFrame(np.arange(12).reshape(6,2))
In [590]: df
Out[590]: 
    0   1
0   0   1
1   2   3
2   4   5
3   6   7
4   8   9
5  10  11

Putting a Series in set works, just as if we'd put a list in set : 把一个系列的set作品，就像我们会把在列表中set ：

In [591]: set(df[0]).union(set(df[1]))
Out[591]: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}

But if I try to put a list containing a Series in the set I get your error. 但是，如果我尝试将包含系列的列表放入集合中，则会收到您的错误消息。

In [592]: set([df[0]])
....
TypeError: 'Series' objects are mutable, thus they cannot be hashed

If the problem isn't with the the set expressions then it occurs in the dict() one. 如果问题不在于set表达式，则它在dict() 。

You did not specify where the error occurs, nor have you given a MVCe. 您没有指定错误发生的位置，也没有给出MVCe。

(but as it turns out this is a deadend) （但事实证明这是一个死胡同）

======================== ========================

OK, simulating your code: OK，模拟您的代码：

In [606]: DC=dict.fromkeys(list(df.columns))
In [607]: DC
Out[607]: {0: None, 1: None}
In [608]: INN=list(df.columns)
In [609]: INN
Out[609]: [0, 1]
In [610]: for col in df:
     ...:     dict(zip(INN, df.apply(jaccard_d, df[col])))
    ....
----> 2     dict(zip(INN, df.apply(jaccard_d, df[col])))


/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
   ...
-> 4125         axis = self._get_axis_number(axis)

/usr/local/lib/python3.5/dist-packages/pandas/core/generic.py in _get_axis_number(self, axis)
    326 
    327     def _get_axis_number(self, axis):
--> 328         axis = self._AXIS_ALIASES.get(axis, axis)
    ....        

TypeError: 'Series' objects are mutable, thus they cannot be hashed

So the problem is in 所以问题出在

df.apply(jaccard_d, df[0])

The problem has nothing to do with jaccard_d . 该问题与jaccard_d 。 It occurs if I replace it with simple 如果我用简单的替换它就会发生

def foo(series1, series2):
    print(series1)
    print(series2)
    return 1

====================== ======================

But look at the docs for apply 但是请查看apply的文档

df.apply(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)

The 2nd argument, if not keyword, is the axis number. 第二个参数（如果不是关键字）是轴号。 So we have been trying to use a Series as the axis number! 因此，我们一直在尝试使用系列作为轴号！ No wonder it objects! 难怪它反对！ That should have been obvious if I'd read the error trace more carefully. 如果我会更仔细地阅读错误跟踪，那应该很明显。

Leaving the default axis=0 , lets pass the other Series as args : 保留默认的axis=0 ，让其他Series作为args传递：

In [632]: df.apply(jaccard_d,args=(df[1],))
Out[632]: 
0    0.0
1    1.0
dtype: float64

or in your loop: 或在您的循环中：

In [643]: for col in df:
     ...:     DC[col] = dict(zip(INN, df.apply(jaccard_d,args=(df[col],))))  
In [644]: DC
Out[644]: {0: {0: 1.0, 1: 0.0}, 1: {0: 0.0, 1: 1.0}}

在pandas系列上使用apply方法获取TypeError'Series'对象是可变的，因此不能将它们散列

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-01-30 02:05:27

在pandas系列上使用apply方法获取TypeError&#39;Series&#39;对象是可变的，因此不能将它们散列

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-01-30 02:05:27

在pandas系列上使用apply方法获取TypeError'Series'对象是可变的，因此不能将它们散列

解决方案1
2 已采纳 2017-01-30 02:05:27