[英]How to keep Zero counts for pandas groupby count for 2 columns dataframe?
If the data frame has 3 columns, I found this StackOverflow answer that gives zero counts: Pandas groupby for zero values 如果数据框有3列,我发现这个StackOverflow答案给出零计数: Pandas groupby为零值
But, HOW to do this for the data frame having only two columns: 但是,如何为只有两列的数据框执行此操作:
Question 题
NOTE: Answer preferable in Chain operations : 注意:链操作中的答案更可取 :
import numpy as np
import pandas as pd
df = pd.DataFrame({'date': pd.date_range('2018-01-01', periods=6),
'a': range(6),
})
df.iloc[2,0] = df.iloc[1,0]
print(df)
date a
0 2018-01-01 0
1 2018-01-02 1
2 2018-01-02 2
3 2018-01-04 3
4 2018-01-05 4
5 2018-01-06 5
To geth the counts of a I do this: 为了计算我的计数:
df1 = (df.query("a > 0")
.groupby(['date'])[['a']]
.count()
.add_suffix('_count')
.reset_index()
)
print(df1)
date a_count
0 2018-01-02 2
1 2018-01-04 1
2 2018-01-05 1
3 2018-01-06 1
Required Answer from Chain operation 连锁经营所需的答案
date a_count
0 2018-01-01 0 # also include this row
0 2018-01-02 2
1 2018-01-04 1
2 2018-01-05 1
3 2018-01-06 1
My attempt: 我的尝试:
df1 = (df.query("a > 0")
.groupby(['date'])[['a']]
.count()
.add_suffix('_count')
.unstack(fill_value=0)
.to_frame()
.stack()
.reset_index()
)
print(df1)
level_0 date level_2 0
0 a_count 2018-01-02 0 2
1 a_count 2018-01-04 0 1
2 a_count 2018-01-05 0 1
3 a_count 2018-01-06 0 1
This does not work. 这不起作用。
How to fix this ? 如何解决这个问题?
Related links: 相关链接:
Pandas groupby for zero values Pandas groupby为零值
Assign a column of the thing you want to count prior to the groupby: 在groupby之前指定要计算的事物的列:
df.assign(to_sum = df.a.gt(0).astype(int)).groupby('date').to_sum.sum()
#date
#2018-01-01 0
#2018-01-02 2
#2018-01-04 1
#2018-01-05 1
#2018-01-06 1
#Name: to_sum, dtype: int32
Can tac on .rename('a_count').reset_index()
to get your exact output. 可以在.rename('a_count').reset_index()
上获取您的确切输出。
Alternatively if the use case is a bit more complicated and that isn't possible, you can always reindex
+ fillna
after the groupby 或者,如果用例有点复杂且无法实现,则可以在groupby之后始终reindex
+ fillna
(df[df.a > 0].groupby('date').a.count()
.reindex(df.date.unique()).fillna(0).astype(int)
.rename('a_count').reset_index())
# date a_count
#0 2018-01-01 0
#1 2018-01-02 2
#2 2018-01-04 1
#3 2018-01-05 1
#4 2018-01-06 1
As simple as you see 就像你看到的一样简单
(df['a'].gt(0)).groupby(df['date']).sum().to_frame('count_a').reset_index()
date count_a
0 2018-01-01 0.0
1 2018-01-02 2.0
2 2018-01-04 1.0
3 2018-01-05 1.0
4 2018-01-06 1.0
Just making @ALollz's answer more beautiful for aesthetics: 让@ ALollz的答案更美观美学:
df1 = (df.assign(
to_sum = lambda x: (x['a']> 0).astype(int)
)
.groupby('date')['to_sum']
.sum()
.rename('a_count')
.to_frame()
.reset_index()
)
print(df1)
print(df1)
date a_count
0 2018-01-01 0
1 2018-01-02 2
2 2018-01-04 1
3 2018-01-05 1
4 2018-01-06 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.