I want to find which Letter has 50% or more of the total price on a given day. For example, in the dataset below, A occurs the most frequently on 06/21, however it does not account 50% of the time or more based on Price. When summed together on 06/21, A = 56 (25% of total), B = 120 (54% of total), and C = 48 (21% of total). So, for each date, if a letter has 50% or more of the total price, I would need the output to show me the letter that occurs as well as the date. If no letter has 50% or more for a date, then no output. The same would occur for 06/22. Even though B occurs most frequently, that's not what I am interested in. B accounts for 59% of the total price for that day, while A is 5% and C is 35%. So the output would be:
B 06/21 0.54 and B 06/21 0.59
import pandas as pd
# initialise data of lists.
data = {'Name':['A', 'B', 'A', 'C', 'C', 'A', 'B', 'A', 'B','B','B', 'C', 'C'], 'Date':
['06/21', '06/21', '06/21', '06/21', '06/21', '06/21', '06/21', '06/22' , '06/22', '06/22', '06/22', '06/22', '06/22'], 'Price': [10, 27, 8, 10, 38, 38, 93, 12, 55, 39, 52, 62, 25]}
# Create DataFrame
df = pd.DataFrame(data)
# Print the output.
print(df)
This might work:
# aggregate multiple entries on a given date
agg_by_date_name = (df
.groupby(["Date", "Name"])
.agg({"Price": "sum"})
)
# calculate share
date_sums = agg_by_date_name.groupby(["Date"])["Price"].transform("sum")
agg_by_date_name["share"] = agg_by_date_name["Price"] / date_sums
# select rows where the share is higher than 50%
keep_high_share = agg_by_date_name["share"] > 0.5
# store the result
result = agg_by_date_name.loc[keep_high_share, ["share"]]
print(result)
# share
# Date Name
# 06/21 B 0.535714
# 06/22 B 0.595918
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.