Pandas Groupby 均值和多列中的第一个

Question

My Pandas df is like following and want to apply groupby and then want to calculate the average and first of many columns

index col1   col2   col3   col4   col5  col6
  0     a      c      1      2      f    5
  1     a      c      1      2      f    7
  2     a      d      1      2      g    9
  3     b      d      6      2      g    4
  4     b      e      1      2      g    8
  5     b      e      1      2      g    2

something like this I tried我试过这样的事情

df.groupby(['col1','col5').agg({['col6','col3']:'mean',['col4','col2']:'first'})

expecting output期待 output

col1  col5   col6  col3  col4  col2
  a     f     6     1     2     c
  a     g     9     1     2     d
  b     g     4     3     2     e

but it seems, list is not an option here, in my real dataset I have 100 of columns of different nature so I cant pass them individually.但似乎列表在这里不是一个选项，在我的真实数据集中，我有 100 个不同性质的列，所以我不能单独传递它们。 Any thoughts on passing them as list?关于将它们作为列表传递的任何想法？

Answer 1

if you have lists depending on the aggregation, you can do:如果您有取决于聚合的列表，您可以执行以下操作：

l_mean = ['col6','col3']
l_first = ['col4','col2']
df.groupby(['col1','col5']).agg({**{col:'mean' for col in l_mean},
                                 **{col:'first' for col in l_first}})

the notation **{} is for unpacking dictionary, doing {**{}, **{}} create one dictionary from 2 dictionaries (it could be ore than two), it is like union of dictionaries.符号**{}用于解包字典，执行{**{}, **{}}从 2 个字典创建一个字典（可以是两个以上），它就像字典的联合。 And doing {col:'mean' for col in l_mean} create a dictionary with each col of the list as a key and 'mean' as value, it is dictionary comprehension.并{col:'mean' for col in l_mean}创建一个字典，列表中的每个col作为键， 'mean'作为值，这是字典理解。

Or using concat :或使用concat ：

gr = df.groupby(['col1','col5'])
pd.concat([gr[l_mean].mean(), 
           gr[l_first].first()], 
          axis=1)

and reset_index after to get the expected output和reset_index之后得到预期的 output

Answer 2

(
    df.groupby(['col1','col5'])
    .agg(col6=('col6', 'mean'),
        col3=('col3', 'mean'),
        col4=('col4', 'first'),
        col2=('col2', 'first'))
)

Answer 3

this is an extension of @Ben.T's solution, just wrapping it in a function and passing it via the pipe method:这是@Ben.T 解决方案的扩展，只需将其包装在 function 中并通过pipe方法传递它：

#set the list1, list2 
def fil(grp,list1,list2):
    A = grp.mean().filter(list1)
    B = grp.first().filter(list2)
    C = A.join(B)
    return C

grp1 = ['col6','col3']
grp2 = ['col4','col2']
m = df.groupby(['col1','col5']).pipe(fil,grp1,grp2)
m

Pandas Groupby 均值和多列中的第一个

问题描述

3 个解决方案

解决方案1
4 已采纳 2020-05-08 03:17:18

解决方案2
0 2020-05-08 03:18:20

解决方案3
0 2020-05-08 04:35:18

Pandas Groupby 均值和多列中的第一个

问题描述

3 个解决方案

解决方案1 4 已采纳 2020-05-08 03:17:18

解决方案2 0 2020-05-08 03:18:20

解决方案3 0 2020-05-08 04:35:18

解决方案1
4 已采纳 2020-05-08 03:17:18

解决方案2
0 2020-05-08 03:18:20

解决方案3
0 2020-05-08 04:35:18