[英]How to do group by and take count of unique and count of some value as aggregate on same column in python pandas?
My question is related to my previous Question but it's different. 我的问题与我以前的问题有关,但它有所不同。 So I am asking the new question. 所以我在问新问题。
In above question see the answer of @jezrael. 在上面的问题中,请参阅@jezrael的答案。
df = pd.DataFrame({'col1':[1,1,1],
'col2':[4,4,6],
'col3':[7,7,9],
'col4':[3,3,5]})
print (df)
col1 col2 col3 col4
0 1 4 7 3
1 1 4 7 3
2 1 6 9 5
df1 = df.groupby(['col1','col2']).agg({'col3':'size','col4':'nunique'})
df1['result_col'] = df1['col3'].div(df1['col4'])
print (df1)
col4 col3 result_col
col1 col2
1 4 1 2 2.0
6 1 1 1.0
Now here I want to take count for the specific value of col4
. 现在我想col4
的具体值。 Say I also want to take count of col4 == 3
in the same query. 假设我也想在同一个查询中计算col4 == 3
。
df.groupby(['col1','col2']).agg({'col3':'size','col4':'nunique'}) ... + count(col4=='3')
How to do this in same above query I have tried bellow but not getting solution. 如何在上面相同的查询中执行此操作我已经尝试过但没有得到解决方案。
df.groupby(['col1','col2']).agg({'col3':'size','col4':'nunique','col4':'x: lambda x[x == 7].count()'})
I think you need aggregate
with list of function in dict
for column col4
. 我认为你需要在列col4
中使用dict
中的函数列表进行aggregate
。
If need count 3
values the simpliest is sum
True
values in x == 3
: 如果需要计数3
值,则最简单的是x == 3
sum
True
值:
df1 = df.groupby(['col1','col2'])
.agg({'col3':'size','col4': ['nunique', lambda x: (x == 3).sum()]})
df1 = df1.rename(columns={'<lambda>':'count_3'})
df1.columns = ['{}_{}'.format(x[0], x[1]) for x in df1.columns]
print (df1)
col4_nunique col4_count_3 col3_size
col1 col2
1 4 1 2 2
6 1 0 1
Do some preprocessing by including the col4==3
as a column ahead of time. 通过将col4==3
作为列提前包含来进行一些预处理。 Then use aggregate
然后使用aggregate
df.assign(result_col=df.col4.eq(3).astype(int)).groupby(
['col1', 'col2']
).agg(dict(col3='size', col4='nunique', result_col='sum'))
col3 result_col col4
col1 col2
1 4 2 2 1
6 1 0 1
old answers 老答案
g = df.groupby(['col1', 'col2'])
g.agg({'col3':'size','col4': 'nunique'}).assign(
result_col=g.col4.apply(lambda x: x.eq(3).sum()))
col3 col4 result_col
col1 col2
1 4 2 1 2
6 1 1 0
slightly rearranged 稍微重新排列
g = df.groupby(['col1', 'col2'])
final_df = g.agg({'col3':'size','col4': 'nunique'})
final_df.insert(1, 'result_col', g.col4.apply(lambda x: x.eq(3).sum()))
final_df
col3 result_col col4
col1 col2
1 4 2 2 1
6 1 0 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.