[英]Data Error Using function and groupby to union strings in pandas dataframe
I have a dataframe of the following structure:我有以下结构的数据框:
mydf: mydf:
Entry Address ShortOrdDesc
0 988 Fake Address 1 SC_M_W_3_1
1 989 Fake Address 2 SC_M_W_3_3
2 992 Fake Address 3 nan_2
3 992 SC_M_G_1_1
4 992 SC_M_O_1_1
There is work to be done on this df to combine rows with the same Entry .在这个 df 上有工作要做,以将行与相同的Entry组合起来。 For these only the first row has Address .
对于这些,只有第一行有Address 。 I need to concatenate the ShortOrdDesc column and Address .
我需要连接ShortOrdDesc列和Address 。 I found a very useful link on this:
我找到了一个非常有用的链接:
Pandas groupby: How to get a union of strings Pandas groupby:如何获得字符串的并集
Working from this I have developed the following function:以此为基础,我开发了以下功能:
def f(x):
return pd.Series(dict(A = x['Entry'].sum(),
B = x['Address'].sum(),
C = "%s" % '; '.join(x['ShortOrdDesc'])))
Which is applied using哪个应用使用
myobj = ordersToprint.groupby('Entry').apply(f)
This returns the error:这将返回错误:
TypeError: must be str, not int
类型错误:必须是 str,而不是 int
Looking at my data I don't see what the issue is, as running .sum() on the integers of 'Entry' should work I believe.查看我的数据,我没有看到问题是什么,因为我相信对 'Entry' 的整数运行.sum()应该可以工作。
What is the error in my code or my approach?我的代码或我的方法有什么错误?
I think some column is numeric and need string
.我认为某些列是数字并且需要
string
。
So use astype
and if need remove NaN
s add dropna
:所以使用
astype
,如果需要删除NaN
s 添加dropna
:
def f(x):
return pd.Series(dict(A = x['Entry'].sum(),
B = ''.join(x['Address'].dropna().astype(str)),
C = '; '.join(x['ShortOrdDesc'].astype(str))))
myobj = ordersToprint.groupby('Entry').apply(f)
print (myobj)
A B C
Entry
988 988 Fake Address 1 SC_M_W_3_1
989 989 Fake Address 2 SC_M_W_3_3
992 2976 Fake Address 3 nan_2; SC_M_G_1_1; SC_M_O_1_1
Another solution with agg
, but then is necessary rename columns: agg
另一个解决方案,但有必要重命名列:
f = {'Entry':'sum',
'Address' : lambda x: ''.join(x.dropna().astype(str)),
'ShortOrdDesc' : lambda x: '; '.join(x.astype(str))}
cols = {'Entry':'A','Address':'B','ShortOrdDesc':'C'}
myobj = ordersToprint.groupby('Entry').agg(f).rename(columns=cols)[['A','B','C']]
print (myobj)
A B C
Entry
988 988 Fake Address 1 SC_M_W_3_1
989 989 Fake Address 2 SC_M_W_3_3
992 2976 Fake Address 3 nan_2; SC_M_G_1_1; SC_M_O_1_1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.