简体   繁体   English

熊猫聚集错误(必须为整数)

[英]Pandas aggragation error (an integer is required)

df_act = pd.DataFrame({'A': {0: 'CHEMBL264', 1: 'CHEMBL4124', 2: 'CHEMBL264', 3: 'CHEMBL233', 4: 'CHEMBL233', 5: 'CHEMBL237', 6: 'CHEMBL236', 7: 'CHEMBL312', 8: 'CHEMBL3820', 9: 'CHEMBL3820'}, 'B': {0: 8.6999999999999993, 1: 8.1600000000000001, 2: 8.3000000000000007, 3: 7.2400000000000002, 4: 8.0, 5: 6.1600000000000001, 6: 6.4400000000000004, 7: 4.8200000000000003, 8: 7.5899999999999999, 9: 7.4299999999999997}})

Doing this works: 做到这一点:

df_act.groupby(['A'])['B'].median()

However, using a custom function to apply it on the groupby object fails: 但是,使用自定义函数将其应用于groupby对象失败:

def fun(x):
     name = {'B_median': x['B'].median()}
     return(pd.Series(names, index = ['B_median']))

df_act.groupby(['A'])['B'].apply(fun)

returns: 收益:

    ---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5126)()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item (pandas/_libs/hashtable.c:14010)()

TypeError: an integer is required

Of course, in both examples I am using the same dataframe, so I don't understand the error. 当然,在两个示例中,我都使用相同的数据框,因此我不理解该错误。

Edit: add df_act definition 编辑:添加df_act定义

The issue is that in this example you'd need to change 问题在于,在此示例中,您需要进行更改

df_act.groupby(['A'])['B'].apply(fun)

to

df_act.groupby(['A']).apply(fun)

As detailed in How is pandas groupby method actually working? “熊猫groupby方法如何真正起作用”中所述。 , the point of .apply is literally to apply a function to each "sub-DataFrame" (group), and then recombine each group's result into your result. .apply的意思是将一个函数应用于每个“ sub-DataFrame”(组),然后将每个组的结果重新组合为结果。

In your fun , you're already referencing 'B'. 在您的fun ,您已经引用了“ B”。 So indexing on it beforehand is redundant. 因此,事先对其进行索引是多余的。

Also note here that you don't really need to wrap your returned object in a Series. 在这里还要注意,您实际上不需要将返回的对象包装在Series中。 It's still a bit contrived but this would suffice: 它仍然有些作弊,但这足够了:

def fun(x):
     return x['B'].median()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM