How to group based on three columns and sum in Python

Question

I want to find which Letter has 50% or more of the total price on a given day. For example, in the dataset below, A occurs the most frequently on 06/21, however it does not account 50% of the time or more based on Price. When summed together on 06/21, A = 56 (25% of total), B = 120 (54% of total), and C = 48 (21% of total). So, for each date, if a letter has 50% or more of the total price, I would need the output to show me the letter that occurs as well as the date. If no letter has 50% or more for a date, then no output. The same would occur for 06/22. Even though B occurs most frequently, that's not what I am interested in. B accounts for 59% of the total price for that day, while A is 5% and C is 35%. So the output would be:

B 06/21 0.54 and B 06/21 0.59

import pandas as pd

# initialise data of lists.
data = {'Name':['A', 'B', 'A', 'C', 'C', 'A', 'B', 'A', 'B','B','B', 'C', 'C'], 'Date': 
['06/21', '06/21', '06/21', '06/21', '06/21', '06/21', '06/21', '06/22' , '06/22', '06/22', '06/22', '06/22', '06/22'], 'Price': [10, 27, 8, 10, 38, 38, 93, 12, 55, 39, 52, 62, 25]}

# Create DataFrame
df = pd.DataFrame(data)

# Print the output.
print(df)

Answer 1

This might work:

# aggregate multiple entries on a given date
agg_by_date_name = (df
    .groupby(["Date", "Name"])
    .agg({"Price": "sum"})
)

# calculate share
date_sums = agg_by_date_name.groupby(["Date"])["Price"].transform("sum")
agg_by_date_name["share"] = agg_by_date_name["Price"] / date_sums

# select rows where the share is higher than 50%
keep_high_share = agg_by_date_name["share"] > 0.5

# store the result
result = agg_by_date_name.loc[keep_high_share, ["share"]]

print(result)
#             share
# Date  Name
# 06/21 B       0.535714
# 06/22 B       0.595918

How to group based on three columns and sum in Python

Question

1 answers

solution1
1 ACCPTED 2022-01-03 16:46:46

How to group based on three columns and sum in Python

Question

1 answers

solution1 1 ACCPTED 2022-01-03 16:46:46

solution1
1 ACCPTED 2022-01-03 16:46:46