简体   繁体   English

在pandas系列上使用apply方法获取TypeError'Series'对象是可变的,因此不能将它们散列

[英]Using apply method on pandas series getting TypeError 'Series' objects are mutable, thus they cannot be hashed

I have two data frames D1 and D2 . 我有两个数据帧D1D2 What I want to achieve is for any column pairs in D1 and D2 which are non-int and non-float type, I want to compute a distance metric using the formula 我要实现的是对于D1D2任何非整数和非浮点类型的列对,我想使用以下公式计算距离度量

 |A intersect B|/ |A union B|

I first defined a function 我先定义一个函数

def jaccard_d(series1, series2):
    if (series1.dtype is not (pd.np.dtype(int) or pd.np.dtype(float))) and     (series2.dtype is not (pd.np.dtype(int) or pd.np.dtype(float))):
        series1 = series1.drop_duplicates()
        series2 = series2.drop_duplicates()
        return len(set(series1).intersection(set(series2)))     /len(set(series1).union(set(series2)))
    else:
        return np.nan

Then what I did is to first loop over all columns in D1 , then for each fixed column in D1 , I use apply on my jaccard_d function. 然后,我所做的是超过所有列的第一环D1 ,然后在每个固定列D1 ,我用apply于我的jaccard_d功能。 I try to avoid writing 2 layer loops. 我尽量避免编写2层循环。 May be there is a better way without wrting any loops? 可能有一种更好的方法而不会产生任何循环?

DC = dict.fromkeys(list(D1.columns))
INN = list(D2.columns)
for col in D1:
    DC[col] = dict(zip(INN, D2.apply(jaccard_d,D1[col])))

First, I am not sure whether I use the apply function correctly, ie, my jaccard_d function takes 2 series as input, but here for each iteration, I have D1[col] as one series, and I want to use apply to apply D1[col] to all columns of D2 首先,我不确定我是否正确使用了apply函数,即我的jaccard_d函数将2个序列作为输入,但是在这里对于每次迭代,我都将D1[col]作为一个序列,并且我想使用apply来应用D1[col]D2所有列

Second, I get this error "'Series' objects are mutable, thus they cannot be hashed", which I don't quite understand. 其次,我收到错误消息“'系列'对象是可变的,因此无法对其进行哈希处理”,我对此不太了解。 Any comments are appreciated. 任何意见表示赞赏。

I tried to just write a 2-layer loop and use my function jaccard_d to do that. 我试图编写一个2层循环,并使用我的函数jaccard_d来做到这一点。 It works. 有用。 But I want to write more efficient code. 但是我想编写更有效的代码。

So after floundering around, and finding exactly where the error occurs, and checking the apply docs, I've deduced that you need to call apply thusly: 因此,经过反复摸索,找出了错误的确切位置,并检查了apply docs之后,我推断出您需要这样调用apply

 D2.apply(jaccard_d, args=(D1[col],))

Instead you were using 相反,您正在使用

 D2.apply(jaccard_d, axis=D1[col])

================== ==================

I can reproduce your error message with a simple dataframe: 我可以使用简单的数据框重现您的错误消息:

In [589]: df=pd.DataFrame(np.arange(12).reshape(6,2))
In [590]: df
Out[590]: 
    0   1
0   0   1
1   2   3
2   4   5
3   6   7
4   8   9
5  10  11

Putting a Series in set works, just as if we'd put a list in set : 把一个系列的set作品,就像我们会把在列表中set

In [591]: set(df[0]).union(set(df[1]))
Out[591]: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}

But if I try to put a list containing a Series in the set I get your error. 但是,如果我尝试将包含系列的列表放入集合中,则会收到您的错误消息。

In [592]: set([df[0]])
....
TypeError: 'Series' objects are mutable, thus they cannot be hashed

If the problem isn't with the the set expressions then it occurs in the dict() one. 如果问题不在于set表达式,则它在dict()

You did not specify where the error occurs, nor have you given a MVCe. 您没有指定错误发生的位置,也没有给出MVCe。

(but as it turns out this is a deadend) (但事实证明这是一个死胡同)

======================== ========================

OK, simulating your code: OK,模拟您的代码:

In [606]: DC=dict.fromkeys(list(df.columns))
In [607]: DC
Out[607]: {0: None, 1: None}
In [608]: INN=list(df.columns)
In [609]: INN
Out[609]: [0, 1]
In [610]: for col in df:
     ...:     dict(zip(INN, df.apply(jaccard_d, df[col])))
    ....
----> 2     dict(zip(INN, df.apply(jaccard_d, df[col])))


/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
   ...
-> 4125         axis = self._get_axis_number(axis)

/usr/local/lib/python3.5/dist-packages/pandas/core/generic.py in _get_axis_number(self, axis)
    326 
    327     def _get_axis_number(self, axis):
--> 328         axis = self._AXIS_ALIASES.get(axis, axis)
    ....        

TypeError: 'Series' objects are mutable, thus they cannot be hashed

So the problem is in 所以问题出在

df.apply(jaccard_d, df[0])

The problem has nothing to do with jaccard_d . 该问题与jaccard_d It occurs if I replace it with simple 如果我用简单的替换它就会发生

def foo(series1, series2):
    print(series1)
    print(series2)
    return 1

====================== ======================

But look at the docs for apply 但是请查看apply的文档

df.apply(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)

The 2nd argument, if not keyword, is the axis number. 第二个参数(如果不是关键字)是轴号。 So we have been trying to use a Series as the axis number! 因此,我们一直在尝试使用系列作为轴号! No wonder it objects! 难怪它反对! That should have been obvious if I'd read the error trace more carefully. 如果我会更仔细地阅读错误跟踪,那应该很明显。

Leaving the default axis=0 , lets pass the other Series as args : 保留默认的axis=0 ,让其他Series作为args传递:

In [632]: df.apply(jaccard_d,args=(df[1],))
Out[632]: 
0    0.0
1    1.0
dtype: float64

or in your loop: 或在您的循环中:

In [643]: for col in df:
     ...:     DC[col] = dict(zip(INN, df.apply(jaccard_d,args=(df[col],))))  
In [644]: DC
Out[644]: {0: {0: 1.0, 1: 0.0}, 1: {0: 0.0, 1: 1.0}}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 类型错误:“系列”对象是可变的,因此它们不能被散列 - TypeError: 'Series' objects are mutable, thus they cannot be hashed 在 dataframe 类型错误中出现错误:“系列”对象是可变的,因此无法对其进行哈希处理 - Getting error in dataframe typeError: 'Series' objects are mutable, thus they cannot be hashed pandas Python Series对象是可变的,因此它们不能在查询方法中进行散列 - pandas Python Series objects are mutable, thus they cannot be hashed in query method 获取类型错误:“系列”对象是可变的,因此在使用 function 将列中的 int 转换为季节时,它们不能被散列 - Getting a TypeError: 'Series' objects are mutable, thus they cannot be hashed when using a function to convert a int in a column to a season 如何修复 TypeError:'Series' 对象是可变的,因此它们不能被散列 - How to fix TypeError: 'Series' objects are mutable, thus they cannot be hashed TypeError: 'Series' 对象是可变的,因此不能对列进行哈希处理 - TypeError: 'Series' objects are mutable, thus they cannot be hashed problemwith column Pandas loc 错误:“系列”对象是可变的,因此它们不能被散列 - Pandas loc error: 'Series' objects are mutable, thus they cannot be hashed 系列对象是可变的,因此它们不能在 Python pandas 数据帧上散列 - Series objects are mutable, thus they cannot be hashed on Python pandas dataframe Python 和 Pandas:“系列”对象是可变的,因此它们不能被散列 - Python & Pandas: 'Series' objects are mutable, thus they cannot be hashed Pandas 返回错误:“系列”对象是可变的,因此它们不能被散列 - Pandas returns error: 'Series' objects are mutable, thus they cannot be hashed
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM