如何在python pandas中的同一列上进行分组并将一些值的唯一数和计数数作为聚合？

Question

My question is related to my previous Question but it's different. 我的问题与我以前的问题有关，但它有所不同。 So I am asking the new question. 所以我在问新问题。

In above question see the answer of @jezrael. 在上面的问题中，请参阅@jezrael的答案。

df = pd.DataFrame({'col1':[1,1,1],
                   'col2':[4,4,6],
                   'col3':[7,7,9],
                   'col4':[3,3,5]})

print (df)
   col1  col2  col3  col4
0     1     4     7     3
1     1     4     7     3
2     1     6     9     5

df1 = df.groupby(['col1','col2']).agg({'col3':'size','col4':'nunique'})
df1['result_col'] = df1['col3'].div(df1['col4'])
print (df1)
           col4  col3  result_col
col1 col2                        
1    4        1     2         2.0
     6        1     1         1.0

Now here I want to take count for the specific value of col4 . 现在我想col4的具体值。 Say I also want to take count of col4 == 3 in the same query. 假设我也想在同一个查询中计算col4 == 3 。

df.groupby(['col1','col2']).agg({'col3':'size','col4':'nunique'}) ... + count(col4=='3')

How to do this in same above query I have tried bellow but not getting solution. 如何在上面相同的查询中执行此操作我已经尝试过但没有得到解决方案。

df.groupby(['col1','col2']).agg({'col3':'size','col4':'nunique','col4':'x: lambda x[x == 7].count()'})

Answer 1

I think you need aggregate with list of function in dict for column col4 . 我认为你需要在列col4中使用dict中的函数列表进行aggregate 。

If need count 3 values the simpliest is sum True values in x == 3 : 如果需要计数3值，则最简单的是x == 3 sum True值：

df1 = df.groupby(['col1','col2'])
        .agg({'col3':'size','col4': ['nunique', lambda x: (x == 3).sum()]})
df1 = df1.rename(columns={'<lambda>':'count_3'})
df1.columns = ['{}_{}'.format(x[0], x[1]) for x in df1.columns]
print (df1)
           col4_nunique  col4_count_3  col3_size
col1 col2                                       
1    4                1             2          2
     6                1             0          1

Answer 2

Do some preprocessing by including the col4==3 as a column ahead of time. 通过将col4==3作为列提前包含来进行一些预处理。 Then use aggregate 然后使用aggregate

df.assign(result_col=df.col4.eq(3).astype(int)).groupby(
    ['col1', 'col2']
).agg(dict(col3='size', col4='nunique', result_col='sum'))

           col3  result_col  col4
col1 col2                        
1    4        2           2     1
     6        1           0     1

old answers 老答案

g = df.groupby(['col1', 'col2'])
g.agg({'col3':'size','col4': 'nunique'}).assign(
    result_col=g.col4.apply(lambda x: x.eq(3).sum()))

           col3  col4  result_col
col1 col2                        
1    4        2     1           2
     6        1     1           0

slightly rearranged 稍微重新排列

g = df.groupby(['col1', 'col2'])
final_df = g.agg({'col3':'size','col4': 'nunique'})
final_df.insert(1, 'result_col', g.col4.apply(lambda x: x.eq(3).sum()))
final_df

           col3  result_col  col4
col1 col2                        
1    4        2           2     1
     6        1           0     1

如何在python pandas中的同一列上进行分组并将一些值的唯一数和计数数作为聚合？

问题描述

2 个解决方案

解决方案1
2 2017-02-06 06:13:02

解决方案2
2 已采纳 2017-02-06 06:22:13

如何在python pandas中的同一列上进行分组并将一些值的唯一数和计数数作为聚合？

问题描述

2 个解决方案

解决方案1 2 2017-02-06 06:13:02

解决方案2 2 已采纳 2017-02-06 06:22:13

解决方案1
2 2017-02-06 06:13:02

解决方案2
2 已采纳 2017-02-06 06:22:13