I need to achieve several things:
Group columns by Country and Products
Perform aggregation to get:
- percentage of my Products column for each country - Calculate the sum of columns Volume and Profit and UnrealizedProfit (2 columns 1st=Volume, 2nd= Profit + UnrealizedProfit)
Display other columns as well
My DataFrame:
Country Sector Products Volume Profit UnrealizedProfit
0 Country_1 Sector1 Product_1 50 5 4
1 Country_1 Sector2 Product_2 100 6 3
2 Country_2 Sector1 Product_1 150 3 -1
3 Country_2 Sector2 Product_2 200 -1 5
4 Country_1 Sector1 Product_2 100 7 10
5 Country_2 Sector2 Product_2 200 -3 -1
6 Country_2 Sector1 Product_1 150 2 -1
7 Country_1 Sector2 Product_1 50 5 -3
Note: I have a couple out thousand rows in my actual dataframe.
Desired output would look like this:
Country Sector Products Product% Volume ExpectedProfit
0 Country_1 Sector1 Product_1 0.138 100 11
1 Country_1 Sector2 Product_2 0.861 200 26
2 Country_2 Sector1 Product_1 0.667 300 3
3 Country_2 Sector2 Product_2 0.333 400 0
I only managed to do one aggregation at a time but not two. So far:
df = (data1.groupby('Country')['Products']
.value_counts(normalize=True,sort=False)
.reset_index(name='Product%'))
print (df)
This code gives me:
Country Products Product%
0 Country 1 Product 1 0.138
1 Country 1 Product 2 0.861
2 Country 2 Product 1 0.667
3 Country 2 Product 2 0.333
Each frequency of product is based on the related country --> sum(Country1) =100%, sum(Country2)=100%...
And for the Volume I manage to replicate it:
df = (data1.groupby(['Country','Product'])['Volume']
.sum()
.reset_index(name='Volume'))
I added the Product in the groupby() because I want to see the volume for each product and country.
The Goal is to combine the Products% and Volume and adding the ExpectedProfit I don't know how to combine that and to do the aggregation for the profit, as mentioned (Profit+UnrealizedProfit) And as well displaying the Sector (I guess the sector could be included in the Groupby() since each the Sector has several products.
Thank you for your help!
All summed up:
new_df = pd.DataFrame()
grouper = df.groupby(["Country", "Products"])
new_df["ExpectedProfit"] = grouper.Profit.sum() + grouper.UnrealizedProfit.sum()
new_df["Volume"] = grouper.Volume.sum()
new_df["%"] = df.groupby(['Country']).Products.value_counts(normalize=True,sort=False)
new_df
Outputs:
ExpectedProfit Volume %
Country Products
Country_1 Product_1 11 100 0.5
Product_2 26 200 0.5
Country_2 Product_1 3 300 0.5
Product_2 0 400 0.5
With Sectors
grouper = df.groupby(["Country", "Products", "Sector"])
Without sectors:
grouper = df.groupby(["Country", "Products"])
result = grouper.Profit.sum() + grouper.UnrealizedProfit.sum()
result = result.reset_index(name="ExpectedProfit")
result sectors:
Country Products Sector ExpectedProfit
0 Country_1 Product_1 Sector1 9
1 Country_1 Product_1 Sector2 2
2 Country_1 Product_2 Sector1 17
3 Country_1 Product_2 Sector2 9
4 Country_2 Product_1 Sector1 3
5 Country_2 Product_2 Sector2 0
result no sectors:
Country Products ExpectedProfit
0 Country_1 Product_1 11
1 Country_1 Product_2 26
2 Country_2 Product_1 3
3 Country_2 Product_2 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.