Can someone helps in identifying the problem? I have written this code below, and then
import numpy as np
import pandas as pd
retail = pd.read_csv('online_retail2.csv')
retail.groupby(['Country','Description'])['Quantity','Price'].agg([np.mean,max])
retail.loc[('Australia','DOLLY GIRL BEAKER'),('Quantity','mean')]
The groupby function has output:
Out[36]:
Quantity Price
mean max mean max
Country Description
Australia DOLLY GIRL BEAKER 200.0 200 1.08 1.08
I LOVE LONDON MINI BACKPACK 4.0 4 4.15 4.15
10 COLOUR SPACEBOY PEN 48.0 48 0.85 0.85
12 PENCIL SMALL TUBE WOODLAND 384.0 384 0.55 0.55
12 PENCILS SMALL TUBE RED SPOTTY 24.0 24 0.65 0.65
... ... ... ...
West Indies VINTAGE BEAD PINK SCARF 3.0 3 7.95 7.95
WHITE AND BLUE CERAMIC OIL BURNER 6.0 6 1.25 1.25
WOODLAND PARTY BAG + STICKER SET 1.0 1 1.65 1.65
WOVEN BERRIES CUSHION COVER 2.0 2 4.95 4.95
WOVEN FROST CUSHION COVER 2.0 2 4.95 4.95
[30696 rows x 4 columns]
while the.loc function resulted in the below error:
KeyError: "None of [Index(['Australia', 'DOLLY GIRL BEAKER'], dtype='object')] are in the [index]"
I think it's because you are not saving the result of groupby+aggregation to a new variable (groupby+aggregation is not an inplace operation, ie it will create a new dataframe and you need to save it otherwise it will just compute and print the result). Basically with your current code you're trying to index your initial dataframe retail
which causes the error.
You can modify your code as follows:
import numpy as np
import pandas as pd
retail = pd.read_csv('online_retail2.csv')
retail_aggregated = retail.groupby(['Country','Description'])[['Quantity','Price']].agg([np.mean,max])
Then you can index your aggregated dataframe as you want:
retail_aggregated.loc[('Australia','DOLLY GIRL BEAKER'),('Quantity','mean')]
Edit : add a full working example
import numpy as np
import pandas as pd
import random
random.seed(123)
np.random.seed(123)
# Here I generate a random dataframe
retail = pd.DataFrame({
"Country": [random.choice(["Australia", "West Indies"]) for _ in range(100)],
"Description": [random.choice([
"DOLLY GIRL BEAKER", "DOLLY GIRL BEAKER", "COLOUR SPACEBOY PEN", "VINTAGE BEAD PINK SCARF", "WOODLAND PARTY BAG + STICKER SET"
]) for _ in range(100)],
"Quantity": np.random.randint(1, 10, 100),
"Price": np.random.randint(1, 100, 100),
})
# Then I groupby and compute aggregate
retail_gp = retail.groupby(['Country','Description'])[['Quantity','Price']].agg([np.mean,max])
retail_gp.loc[('Australia','DOLLY GIRL BEAKER'),('Quantity','mean')]
Output:
4.894736842105263
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.