[英]Counting NaN values in pandas group by
I have a df
like this:我有一个这样的
df
:
Country product date_install date_purchase id
BR yearly 2020-11-01-01:11:36 2020-11-01-01:11:26 10660236
CA monthly 2020-11-01-01:11:49 2020-11-01-01:11:32 10649441
US yearly 2020-11-01-01:11:54 2020-11-01-01:11:33 10660272
IT monthly 2020-11-01-11:11:01 2020-11-01-01:11:34 10657634
AE monthly 2020-11-01-01:11:38 2020-11-01-01:11:39 10661442
US NaN 2021-01-12-03:01:31 NaN 12815946
CA NaN 2020-12-04-02:12:48 NaN 11647714
US NaN 2020-12-28-11:12:54 NaN 12323174
ID NaN 2021-02-02-01:02:58 NaN 13714980
US NaN 2020-11-15-10:11:05 NaN 11056138
I am trying to get this:我试图得到这个:
country product installs purchases
BR yearly 1 1
BR NaN 100 0 # people who installed but not purchased
CA monthly 1 1
US yearly 10 10
US monthly 15 15
US NaN 500 0 # people who installed but not purchased
Or even more better would be:或者更好的是:
country installs yearly monthly total
BR 1000 10 100 110
CA 2000 50 5 55
I tried:我试过了:
df.groupby(['country','product']).count().sort_values('date_install',ascending=False)
But then all the values are the same, matching the number of purchases, meaning everyone who installs purchases.但是所有的值都是相同的,与购买数量相匹配,这意味着安装购买的每个人。
date_install date_purchase id
country product
US monthly 3373 3373 3373
AU monthly 1478 1478 1478
US yearly 954 954 954
And if I use:如果我使用:
df = df.replace(np.nan, 'empty', regex=True)
df.groupby(['country','product']).count().sort_values('date_install',ascending=False)
I get:我得到:
date_install date_purchase id
country product
US empty 480153 480153 480153
AU empty 334236 334236 334236
BR empty 144920 144920 144920
How can I achieve this result?我怎样才能达到这个结果?
Indeed, if you follow @Paul Brennan's advice, the solution comes quite easier.事实上,如果您遵循@Paul Brennan 的建议,解决方案会变得非常容易。 As an example consider the following data
例如,考虑以下数据
Country product date_install date_purchase id
0 BR yearly 2020-01-01-01:00:00 2020-01-01-01:00:00 10660236
3 BR monthly 2020-01-01-04:00:00 2020-01-01-04:00:00 10660239
6 BR NaN 2020-01-01-07:00:00 NaN 10660242
9 BR NaN 2020-01-01-10:00:00 NaN 10660245
1 CA yearly 2020-01-01-02:00:00 2020-01-01-02:00:00 10660237
4 CA yearly 2020-01-01-05:00:00 2020-01-01-05:00:00 10660240
7 CA NaN 2020-01-01-08:00:00 NaN 10660243
10 CA yearly 2020-01-01-11:00:00 2020-01-01-11:00:00 10660246
2 US monthly 2020-01-01-03:00:00 2020-01-01-03:00:00 10660238
5 US NaN 2020-01-01-06:00:00 NaN 10660241
8 US monthly 2020-01-01-09:00:00 2020-01-01-09:00:00 10660244
11 US monthly 2020-01-01-12:00:00 2020-01-01-12:00:00 10660247
Let's say the 'not purchased' version is a demo
or something like that:假设“未购买”版本是
demo
或类似的东西:
df['product'] = df['product'].fillna('demo')
You can do the following您可以执行以下操作
ans = (df.groupby([df['Country'], df['product']])
.apply(len)
.unstack()
.fillna(0)
.astype(int)
.rename_axis(columns='', index='')
.assign(installed=lambda x: x[['demo', 'monthly', 'yearly']].sum(axis=1),
purchased=lambda x: x[['monthly', 'yearly']].sum(axis=1))
)
And the resulting dataframe is as follows:得到的 dataframe 如下:
demo monthly yearly installed purchased
BR 2 1 1 4 2
CA 1 0 3 4 3
US 1 3 0 4 3
Answering your comment's question, you can't keep the dates from each user since groupby
summarizes all the information, losing these individual details.回答您的评论问题时,您无法保留每个用户的日期,因为
groupby
汇总了所有信息,丢失了这些个人详细信息。
What's possible to do is to assign the desirable columns from the resulting dataframe to the first one (getting some duplicate values), like:可以做的是将生成的 dataframe 中的所需列分配给第一个列(获取一些重复值),例如:
df = df.assign(purchased=df['Country'].map(ans['purchased']),
installed=df['Country'].map(ans['installed']))
This will make you first dataframe looks like:这将使您首先 dataframe 看起来像:
Country product date_install date_purchase id purchased installed
0 BR yearly 2020-01-01-01:00:00 2020-01-01-01:00:00 10660236 2 4
1 CA yearly 2020-01-01-02:00:00 2020-01-01-02:00:00 10660237 3 4
2 US monthly 2020-01-01-03:00:00 2020-01-01-03:00:00 10660238 3 4
.
.
.
If this is not what you wanted, please let us know and we'll try to figure it out.如果这不是您想要的,请告诉我们,我们会尽力解决。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.