简体   繁体   中英

Efficient Subtotaling of columns in a pandas dataframe

I've been toying around with an idea for a program at work that automates our end of the month reports. Currently, it creates all the reports for us in Excel format and then we manually use Excel's subtotal feature to subtotal its columns and format the data into a table.

My idea is to subtotal each of the columns by customer, like so:

Patient Date Rx# Description Qty Price
EXAMPLE, JOHN 2/1/2021 357649 Aspirin 30 6.99
EXAMPLE, JOHN 2/1/2021 357650 Drug 30 13.99
EXAMPLE, JOHN 2/1/2021 357651 Tylenol 30 7.99
EXAMPLE, JOHN Subtotal 28.97
EXAMPLE, SUSAN 2/12/2021 357652 Expensive Drug 30 51.99
EXAMPLE, SUSAN 2/12/2021 357653 Drug 30 13.99
EXAMPLE, SUSAN 2/12/2021 357654 Tylenol 30 7.99
EXAMPLE, SUSAN Subtotal 73.97

With the exisiting dataframe looking like:

Patient Date Rx# Description Qty Price
EXAMPLE, JOHN 2/1/2021 357649 Aspirin 30 6.99
EXAMPLE, JOHN 2/1/2021 357650 Drug 30 13.99
EXAMPLE, JOHN 2/1/2021 357651 Tylenol 30 7.99
EXAMPLE, SUSAN 2/12/2021 357652 Expensive Drug 30 51.99
EXAMPLE, SUSAN 2/12/2021 357653 Drug 30 13.99
EXAMPLE, SUSAN 2/12/2021 357654 Tylenol 30 7.99

Is this possible with groupby()? It seems to have an option to group by rows rather than columns. The bigger issue I see is inserting into the existing dataframe, as it seems that pandas is more designed for manipulating/performing operations on large datasets rather than inserting/adding information.

# Calculate sums
df_subtotal = df.groupby('Patient', as_index=False)[['Price']].agg('sum')
# Manipulate string Patient
df_subtotal['Patient'] = df_subtotal['Patient'] + ' subtotal'
# Join dataframes
df_new = pd.concat([df, df_subtotal], axis=0, ignore_index=True)
# Sort
df_new = df_new.sort_values(['Patient', 'Date'])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM