简体   繁体   中英

Group By - Total Hours and Hours by Category in Python / Pandas

I need to calculate Total Hours and Hours by Status per Week using Python / Pandas GROUP BY .

Id             Week          Status Hours

1   01/10/2022 - 01/16/2022    On     5
2   01/10/2022 - 01/16/2022    Off    2
3   01/17/2022 - 01/23/2022    Off    6
4   01/17/2022 - 01/23/2022    On     1
5   01/17/2022 - 01/23/2022    On     5
6   01/03/2022 - 01/09/2022    On     10
7   01/10/2022 - 01/16/2022    Off    9
8   01/03/2022 - 01/09/2022    On     3
9   01/24/2022 - 01/30/2022    Off    4
10  01/24/2022 - 01/30/2022    On     7
test_data = {'Id': [1,2,3,4,5,6,7,8,9,10], 
             'Week': ['01/10/2022 - 01/16/2022', '01/10/2022 - 01/16/2022', '01/17/2022 - 01/23/2022', '01/17/2022 - 01/23/2022', '01/17/2022 - 01/23/2022', '01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/03/2022 - 01/09/2022', '01/24/2022 - 01/30/2022', '01/24/2022 - 01/30/2022'], 
             'Status': ['On', 'Off', 'Off', 'On', 'On', 'On', 'Off', 'On', 'Off', 'On'], 
             'Hours': [5,2,6,1,5,10,9,3,4,7]}

test_df = pd.DataFrame(data=test_data)

I can get Total Hours by each Week:

test_df.groupby(by=['Week'], as_index=False).agg({"Hours": "sum"})

But I don't know how to also group by Status, so it will be 2 additional columns ( On Status Hours and Off Status Hours )

If I add Status column just to the groupby part, it creates extra rows (I understand why)

test_df.groupby(by=['Week', 'Status'], as_index=False).agg({"Hours": "sum"})

Output I want:

Week Total Hours On Status Hours Off Status Hours
01/03/2022 - 01/09/2022 13 13 0
01/10/2022 - 01/16/2022 16 5 11
01/17/2022 - 01/23/2022 12 6 6
01/24/2022 - 01/30/2022 11 7 4

You can use pd.pivot_table to get your result:

x = pd.pivot_table(
    test_df,
    index="Week",
    columns="Status",
    values="Hours",
    aggfunc="sum",
    fill_value=0,
).add_suffix(" Status Hours")
x["Total Hours"] = x.sum(axis=1)
print(x)

Prints:

Status                   Off Status Hours  On Status Hours  Total Hours
Week                                                                   
01/03/2022 - 01/09/2022                 0               13           13
01/10/2022 - 01/16/2022                11                5           16
01/17/2022 - 01/23/2022                 6                6           12
01/24/2022 - 01/30/2022                 4                7           11

You can use:

(test_df
 .groupby(['Week', 'Status'])['Hours']
 .sum()
 .unstack(1, fill_value=0)
 .add_suffix(' Status Hours')
 .assign(**{'Total Hours': lambda d: d.sum(1)})
 )

Output:

Status                   Off Status Hours  On Status Hours  Total Hours
Week                                                                   
01/03/2022 - 01/09/2022                 0               13           13
01/10/2022 - 01/16/2022                11                5           16
01/17/2022 - 01/23/2022                 6                6           12
01/24/2022 - 01/30/2022                 4                7           11

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM