简体   繁体   中英

Extracting new columns with counts out of pandas data frame groupby

I am dealing with a pandas dataframe like this one:

     Day  Hour         Prio  Value
0      1     6     Critical      1
1      1    16     Critical      1
2      1    17      Content      1
3      1    17          Low      1
6      1    19     Critical      1
7      1    20         High      1
8      2    10         High      1
9      2    10          Low      2

And now I want want to group by Day and Hour while generating new columns representing the count of each value in the column Prio , which currently is present in the column value . So I want to achieve this structure:

     Day  Hour  Critical  Content  Low  High
0      1     6         1        0    0     0
1      1    16         1        0    0     0
2      1    17         0        1    1     0
6      1    19         1        0    0     0
7      1    20         0        0    0     1
8      2    10         0        0    2     1

I tried different things now, but have not been rather successful. I am targeting at merging this data frame with another one containing other columns by Day and Hour in order to further aggregate them. Especially I need the percentage shares per day/hour among the priorities (at least one non-zero value is always present).

In a past solution I was iterating over each row to extract the single values, but this has been rather slow. I want to keep it as efficient as possible as the data should update live within a bokeh server app. Maybe there is a solution without using itertuples or something similar? Thank you!

df.groupby(['Day','Hour','Prio']).sum().unstack().fillna(0).astype(int)
#           Value                  
#Prio     Content Critical High Low
#Day Hour                          
#1   6          0        1    0   0
#    16         0        1    0   0
#    17         1        0    0   1
#    19         0        1    0   0
#    20         0        0    1   0
#2   10         0        0    1   2

You can further reset index, if you want.

Or you can try

pd.pivot_table(df,values='Value',index=['Day','Hour'],columns=['Prio'],aggfunc='sum')\
     .fillna(0).astype(int)


Out[22]: 
Prio      Content  Critical  High  Low
Day Hour                              
1   6           0         1     0    0
    16          0         1     0    0
    17          1         0     0    1
    19          0         1     0    0
    20          0         0     1    0
2   10          0         0     1    2

Let's use set_index , unstack , reset_index , and rename_axis :

df.set_index(['Day','Hour','Prio'])['Value']\
  .unstack().fillna(0)\
  .astype(int).reset_index()\
  .rename_axis(None,1)

Output:

   Day  Hour  Content  Critical  High  Low
0    1     6        0         1     0    0
1    1    16        0         1     0    0
2    1    17        1         0     0    1
3    1    19        0         1     0    0
4    1    20        0         0     1    0
5    2    10        0         0     1    2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM