简体   繁体   English

Pandas Groupby 均值和多列中的第一个

[英]Pandas Groupby mean and first of multiple columns

My Pandas df is like following and want to apply groupby and then want to calculate the average and first of many columns

index col1   col2   col3   col4   col5  col6
  0     a      c      1      2      f    5
  1     a      c      1      2      f    7
  2     a      d      1      2      g    9
  3     b      d      6      2      g    4
  4     b      e      1      2      g    8
  5     b      e      1      2      g    2

something like this I tried我试过这样的事情

df.groupby(['col1','col5').agg({['col6','col3']:'mean',['col4','col2']:'first'})

expecting output期待 output

col1  col5   col6  col3  col4  col2
  a     f     6     1     2     c
  a     g     9     1     2     d
  b     g     4     3     2     e

but it seems, list is not an option here, in my real dataset I have 100 of columns of different nature so I cant pass them individually.但似乎列表在这里不是一个选项,在我的真实数据集中,我有 100 个不同性质的列,所以我不能单独传递它们。 Any thoughts on passing them as list?关于将它们作为列表传递的任何想法?

if you have lists depending on the aggregation, you can do:如果您有取决于聚合的列表,您可以执行以下操作:

l_mean = ['col6','col3']
l_first = ['col4','col2']
df.groupby(['col1','col5']).agg({**{col:'mean' for col in l_mean},
                                 **{col:'first' for col in l_first}})

the notation **{} is for unpacking dictionary, doing {**{}, **{}} create one dictionary from 2 dictionaries (it could be ore than two), it is like union of dictionaries.符号**{}用于解包字典,执行{**{}, **{}}从 2 个字典创建一个字典(可以是两个以上),它就像字典的联合。 And doing {col:'mean' for col in l_mean} create a dictionary with each col of the list as a key and 'mean' as value, it is dictionary comprehension.{col:'mean' for col in l_mean}创建一个字典,列表中的每个col作为键, 'mean'作为值,这是字典理解。

Or using concat :或使用concat

gr = df.groupby(['col1','col5'])
pd.concat([gr[l_mean].mean(), 
           gr[l_first].first()], 
          axis=1)

and reset_index after to get the expected outputreset_index之后得到预期的 output

(
    df.groupby(['col1','col5'])
    .agg(col6=('col6', 'mean'),
        col3=('col3', 'mean'),
        col4=('col4', 'first'),
        col2=('col2', 'first'))
)

this is an extension of @Ben.T's solution, just wrapping it in a function and passing it via the pipe method:这是@Ben.T 解决方案的扩展,只需将其包装在 function 中并通过pipe方法传递它:

#set the list1, list2 
def fil(grp,list1,list2):
    A = grp.mean().filter(list1)
    B = grp.first().filter(list2)
    C = A.join(B)
    return C

grp1 = ['col6','col3']
grp2 = ['col4','col2']
m = df.groupby(['col1','col5']).pipe(fil,grp1,grp2)
m

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM