简体   繁体   English

Pandas Groupby 仅针对特定字符串值对多列进行计数

[英]Pandas Groupby count on multiple columns for specific string values only

I have a data frame like this我有一个这样的数据框

dummy = pd.DataFrame([
('01/09/2020', 'TRUE', 'FALSE'),
('01/09/2020', 'TRUE', 'TRUE'),
('02/09/2020', 'FALSE', 'TRUE'),
('02/09/2020', 'TRUE', 'FALSE'),
('03/09/2020', 'FALSE', 'FALSE'),
('03/09/2020', 'TRUE', 'TRUE'),
('03/09/2020', 'TRUE', 'FALSE')], columns=['date', 'Action1', 'Action2'])

在此处输入图像描述

Now I want an aggregation of 'TRUE' action per day, which should look like现在我想要每天聚合“TRUE”动作,它应该看起来像
在此处输入图像描述

I applied group by, sum and count etc but nothing is working for me as it i have to aggegate multiple columns and I don't want to split the table into multiple dataframes and resolve it indivisually and merge into one, can someone please suggest any smart way to do it.我应用了 group by、sum 和 count 等,但没有什么对我有用,因为我必须聚合多个列,我不想将表拆分为多个数据框并单独解决并合并为一个,有人可以建议任何聪明的方式来做到这一点。

True and False in your dummy df are strings, you can convert them to int and sum虚拟 df 中的 True 和 False 是字符串,您可以将它们转换为 int 和 sum

dummy.replace({'TRUE':1,'FALSE':0}).groupby('date',as_index = False).sum()

    date        Action1 Action2
0   01/09/2020  2       1
1   02/09/2020  1       1
2   03/09/2020  2       1

You can also try:你也可以试试:

dummy.set_index(['date']).eq('TRUE').sum(level='date')

Output: Output:

            Action1  Action2
date                        
01/09/2020        2        1
02/09/2020        1        1
03/09/2020        2        1

Anyone seeing this answer should look at the answers by @QuangHoang or @Vaishali任何看到这个答案的人都应该看看@QuangHoang@Vaishali的答案
They are much better answers.它们是更好的答案。 I can't control what the OP chooses, but you should go upvote those answers.我无法控制 OP 选择什么,但您应该 go 支持这些答案。

Inspired by @QuangHoang灵感来自@QuangHoang

dummy.iloc[:, 1:].eq('TRUE').groupby(dummy.date).sum()

            Action1  Action2
date                        
01/09/2020        2        1
02/09/2020        1        1
03/09/2020        2        1

OLD ANSWER旧答案

Fix your dataframe such that it has actual True / False values修复您的 dataframe 使其具有实际的True / False

from ast import literal_eval

dummy = dummy.assign(**dummy[['Action1', 'Action2']].applymap(str.title).applymap(literal_eval))

Then use groupby然后使用groupby

dummy.groupby('date').sum()

            Action1  Action2
date                        
01/09/2020        2        1
02/09/2020        1        1
03/09/2020        2        1
In [7]: dummy Out[7]: date Action1 Action2 0 01/09/2020 TRUE FALSE 1 01/09/2020 TRUE TRUE 2 02/09/2020 FALSE TRUE 3 02/09/2020 TRUE FALSE 4 03/09/2020 FALSE FALSE 5 03/09/2020 TRUE TRUE 6 03/09/2020 TRUE FALSE In [9]: dummy.groupby(['date'], as_index=False).agg(lambda x: x.eq('TRUE').sum()) Out[9]: date Action1 Action2 0 01/09/2020 2 1 1 02/09/2020 1 1 2 03/09/2020 2 1

You can also use pivot table:您还可以使用 pivot 表:

dummy.pivot_table(index='date', values=['Action1', 'Action2'], 
                  aggfunc=lambda x: (x=='TRUE').sum()).reset_index()

Output: Output:

          date  Action1 Action2
0   01/09/2020        2       1
1   02/09/2020        1       1
2   03/09/2020        2       1

On the similar path using .resample在使用.resample的类似路径上

...
dummy['date'] = pd.to_datetime(dummy['date'], dayfirst=True)
dummy[['Action1', 'Action2']] = dummy[['Action1', 'Action2']].replace({'TRUE':True, 'FALSE': False})

# set date to index
dummy.set_index('date', inplace=True)

dummy.resample('1D').sum()

See resample documentation请参阅重采样文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM