繁体   English   中英

Python - 使用.mean()和.agg()分组多个列

[英]Python - Group-by multiple columns with .mean() and .agg()

我想分组三列,然后找到所有行的第四个数字列的平均值,这些行在前三列中重复。 我可以通过以下功能实现此目的:

df2 = df.groupby(['col1', 'col2', 'col3'], as_index=False)['col4'].mean()

问题是我还想要第五列,它将聚合由groupby函数分组的所有行,我不知道如何在上一个函数之上做。 例如:

df 
index    col1        col2       col3       col4       col5
0        Week_1      James      John       1          when and why?
1        Week_1      James      John       3          How?
2        Week_2      James      John       2          Do you know when?
3        Week_2      Mark       Jim        3          What time?
4        Week_2      Andrew     Simon      1          How far is it?
5        Week_2      Andrew     Simon      2          Are you going?


CURRENT(with above function):
index    col1        col2       col3       col4
0        Week_1      James      John       2
1        Week_2      James      John       2
2        Week_2      Mark       Jim        3
3        Week_2      Andrew     Simon      1.5

DESIRED:
index    col1        col2       col3       col4       col5
0        Week_1      James      John       2          when and why?, How?
2        Week_2      James      John       2          Do you know when?
3        Week_2      Mark       Jim        3          What time?
4        Week_2      Andrew     Simon      1.5        How far is it?, Are you going?

我在这里这里试过,但是我正在使用的.mean()函数使这个过程变得复杂。 任何帮助,将不胜感激。 (如果可能的话,我想在聚合时指定一个自定义分隔符来分隔col5的字符串)。

您可以为每个列聚合函数定义:

df2=df.groupby(['col1','col2','col3'], as_index=False).agg({'col4':'mean', 'col5':','.join})
print (df2)
     col1    col2   col3  col4                           col5
0  Week_1   James   John   2.0             when and why?,How?
1  Week_2  Andrew  Simon   1.5  How far is it?,Are you going?
2  Week_2   James   John   2.0              Do you know when?
3  Week_2    Mark    Jim   3.0                     What time?

一般解决方案是按mean聚合的数字列和其他通过join

f = lambda x: x.mean() if np.issubdtype(x.dtype, np.number) else ', '.join(x)
df2 = df.groupby(['col1', 'col2', 'col3'], as_index=False).agg(f)
print (df2)

     col1    col2   col3  col4                            col5
0  Week_1   James   John   2.0             when and why?, How?
1  Week_2  Andrew  Simon   1.5  How far is it?, Are you going?
2  Week_2   James   John   2.0               Do you know when?
3  Week_2    Mark    Jim   3.0                      What time?
df = pd.DataFrame({
        'col1':['a','a','b','b'],
        'col2':[1,2,1,1],
        'col3':['str1','str2','str3','str4']
        })

result = df.groupby(['col1','col2'])['col3'].apply(lambda x:','.join(list(x)))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM