简体   繁体   English

对熊猫数据框的分组和聚合操作

[英]Groupby and aggregate operation on Pandas Dataframe

I have a Pandas dataframe: 我有一个熊猫数据框:

     Date     Type     Section      Status   
--------------------------------------------
0     1-Apr    Type1       A         Present
1     1-Apr    Type2       A         Absent
2     1-Apr    Type2       A         Present
3     1-Apr    Type1       B         Absent
4     2-Apr    Type1       A         Present
5     2-Apr    Type2       C         Present
6     2-Apr    Type2       C         Present    

I'd like to groupby the DF into a bit different format: 我想将DF分组为一些不同的格式:

     Date     Type     A_Pre  A_Abs   B_Pre   B_Abs    C_Pre   C_Abs   
------------------------------------------------------------------------------
0     1-Apr    Type1       1    0       0       1        0        0 
1              Type2       1    1       0       0        0        0
2     2-Apr    Type1       1    0       0       0        0        0         
3              Type2       0    0       0       0        1        1         

I want to get an aggregated report from the original table where the entries are grouped by Date and Type and then split into various types. 我想从原始表中获取汇总报告,在该表中,条目按日期和类型分组,然后分成各种类型。 I have not idea how to handle this approach after 2 days of trying. 经过2天的尝试,我不知道如何处理此方法。

Any help would be greatly appreciated. 任何帮助将不胜感激。

Firstly I would create the columns you wish to aggregate populated with zeros and ones, and then use groupby and do a simple sum of the values... 首先,我将创建要聚合的以零和一填充的列,然后使用groupby并对这些值进行简单的求和...

I didnt get to try this out, but I think the following should work: 我没有尝试一下,但是我认为以下应该可行:

Present = ['A_Pre',  'B_Pre',  'C_Pre' ]
Absent = ['A_Abs',  'B_Abs',  'C_Abs' ]

for string in Present:
    DF[string] = pd.Series([1 if stat == 'Present' and sect == string[0] else 0 
                            for stat, sect in zip(DF['Status'], DF['Section'])], 
                            index = DF.index)
for string in Absent:
    DF[string] = pd.Series([1 if stat == 'Absent' and sect == string[0] else 0 
                            for stat, sect in zip(DF['Status'], DF['Section'])], 
                            index = DF.index)

DF.groupby(['Date', 'type']).agg(sum)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM