[英]Using apply method on pandas series getting TypeError 'Series' objects are mutable, thus they cannot be hashed
I have two data frames D1
and D2
. 我有两个数据帧
D1
和D2
。 What I want to achieve is for any column pairs in D1
and D2
which are non-int and non-float type, I want to compute a distance metric using the formula 我要实现的是对于
D1
和D2
任何非整数和非浮点类型的列对,我想使用以下公式计算距离度量
|A intersect B|/ |A union B|
I first defined a function 我先定义一个函数
def jaccard_d(series1, series2):
if (series1.dtype is not (pd.np.dtype(int) or pd.np.dtype(float))) and (series2.dtype is not (pd.np.dtype(int) or pd.np.dtype(float))):
series1 = series1.drop_duplicates()
series2 = series2.drop_duplicates()
return len(set(series1).intersection(set(series2))) /len(set(series1).union(set(series2)))
else:
return np.nan
Then what I did is to first loop over all columns in D1
, then for each fixed column in D1
, I use apply
on my jaccard_d
function. 然后,我所做的是超过所有列的第一环
D1
,然后在每个固定列D1
,我用apply
于我的jaccard_d
功能。 I try to avoid writing 2 layer loops. 我尽量避免编写2层循环。 May be there is a better way without wrting any loops?
可能有一种更好的方法而不会产生任何循环?
DC = dict.fromkeys(list(D1.columns))
INN = list(D2.columns)
for col in D1:
DC[col] = dict(zip(INN, D2.apply(jaccard_d,D1[col])))
First, I am not sure whether I use the apply
function correctly, ie, my jaccard_d
function takes 2 series as input, but here for each iteration, I have D1[col]
as one series, and I want to use apply
to apply D1[col]
to all columns of D2
首先,我不确定我是否正确使用了
apply
函数,即我的jaccard_d
函数将2个序列作为输入,但是在这里对于每次迭代,我都将D1[col]
作为一个序列,并且我想使用apply
来应用D1[col]
到D2
所有列
Second, I get this error "'Series' objects are mutable, thus they cannot be hashed", which I don't quite understand. 其次,我收到错误消息“'系列'对象是可变的,因此无法对其进行哈希处理”,我对此不太了解。 Any comments are appreciated.
任何意见表示赞赏。
I tried to just write a 2-layer loop and use my function jaccard_d
to do that. 我试图编写一个2层循环,并使用我的函数
jaccard_d
来做到这一点。 It works. 有用。 But I want to write more efficient code.
但是我想编写更有效的代码。
So after floundering around, and finding exactly where the error occurs, and checking the apply
docs, I've deduced that you need to call apply
thusly: 因此,经过反复摸索,找出了错误的确切位置,并检查了
apply
docs之后,我推断出您需要这样调用apply
:
D2.apply(jaccard_d, args=(D1[col],))
Instead you were using 相反,您正在使用
D2.apply(jaccard_d, axis=D1[col])
================== ==================
I can reproduce your error message with a simple dataframe: 我可以使用简单的数据框重现您的错误消息:
In [589]: df=pd.DataFrame(np.arange(12).reshape(6,2))
In [590]: df
Out[590]:
0 1
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9
5 10 11
Putting a Series in set
works, just as if we'd put a list in set
: 把一个系列的
set
作品,就像我们会把在列表中set
:
In [591]: set(df[0]).union(set(df[1]))
Out[591]: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}
But if I try to put a list containing a Series in the set I get your error. 但是,如果我尝试将包含系列的列表放入集合中,则会收到您的错误消息。
In [592]: set([df[0]])
....
TypeError: 'Series' objects are mutable, thus they cannot be hashed
If the problem isn't with the the set
expressions then it occurs in the dict()
one. 如果问题不在于
set
表达式,则它在dict()
。
You did not specify where the error occurs, nor have you given a MVCe. 您没有指定错误发生的位置,也没有给出MVCe。
(but as it turns out this is a deadend) (但事实证明这是一个死胡同)
======================== ========================
OK, simulating your code: OK,模拟您的代码:
In [606]: DC=dict.fromkeys(list(df.columns))
In [607]: DC
Out[607]: {0: None, 1: None}
In [608]: INN=list(df.columns)
In [609]: INN
Out[609]: [0, 1]
In [610]: for col in df:
...: dict(zip(INN, df.apply(jaccard_d, df[col])))
....
----> 2 dict(zip(INN, df.apply(jaccard_d, df[col])))
/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
...
-> 4125 axis = self._get_axis_number(axis)
/usr/local/lib/python3.5/dist-packages/pandas/core/generic.py in _get_axis_number(self, axis)
326
327 def _get_axis_number(self, axis):
--> 328 axis = self._AXIS_ALIASES.get(axis, axis)
....
TypeError: 'Series' objects are mutable, thus they cannot be hashed
So the problem is in 所以问题出在
df.apply(jaccard_d, df[0])
The problem has nothing to do with jaccard_d
. 该问题与
jaccard_d
。 It occurs if I replace it with simple 如果我用简单的替换它就会发生
def foo(series1, series2):
print(series1)
print(series2)
return 1
====================== ======================
But look at the docs for apply
但是请查看
apply
的文档
df.apply(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)
The 2nd argument, if not keyword, is the axis number. 第二个参数(如果不是关键字)是轴号。 So we have been trying to use a Series as the axis number!
因此,我们一直在尝试使用系列作为轴号! No wonder it objects!
难怪它反对! That should have been obvious if I'd read the error trace more carefully.
如果我会更仔细地阅读错误跟踪,那应该很明显。
Leaving the default axis=0
, lets pass the other Series as args
: 保留默认的
axis=0
,让其他Series作为args
传递:
In [632]: df.apply(jaccard_d,args=(df[1],))
Out[632]:
0 0.0
1 1.0
dtype: float64
or in your loop: 或在您的循环中:
In [643]: for col in df:
...: DC[col] = dict(zip(INN, df.apply(jaccard_d,args=(df[col],))))
In [644]: DC
Out[644]: {0: {0: 1.0, 1: 0.0}, 1: {0: 0.0, 1: 1.0}}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.