KeyError: "None of [Index(['...', '...'], dtype='object')] are in the [index]"

Question

Can someone helps in identifying the problem? I have written this code below, and then

import numpy as np
import pandas as pd
retail = pd.read_csv('online_retail2.csv')

retail.groupby(['Country','Description'])['Quantity','Price'].agg([np.mean,max])
retail.loc[('Australia','DOLLY GIRL BEAKER'),('Quantity','mean')]

The groupby function has output:

Out[36]: 
                                              Quantity      Price      
                                                  mean  max  mean   max
Country     Description                                                
Australia    DOLLY GIRL BEAKER                   200.0  200  1.08  1.08
             I LOVE LONDON MINI BACKPACK           4.0    4  4.15  4.15
            10 COLOUR SPACEBOY PEN                48.0   48  0.85  0.85
            12 PENCIL SMALL TUBE WOODLAND        384.0  384  0.55  0.55
            12 PENCILS SMALL TUBE RED SPOTTY      24.0   24  0.65  0.65
                                               ...  ...   ...   ...
West Indies VINTAGE BEAD PINK SCARF                3.0    3  7.95  7.95
            WHITE AND BLUE CERAMIC OIL BURNER      6.0    6  1.25  1.25
            WOODLAND PARTY BAG + STICKER SET       1.0    1  1.65  1.65
            WOVEN BERRIES CUSHION COVER            2.0    2  4.95  4.95
            WOVEN FROST CUSHION COVER              2.0    2  4.95  4.95

[30696 rows x 4 columns]

while the.loc function resulted in the below error:

KeyError: "None of [Index(['Australia', 'DOLLY GIRL BEAKER'], dtype='object')] are in the [index]"

Answer 1

I think it's because you are not saving the result of groupby+aggregation to a new variable (groupby+aggregation is not an inplace operation, ie it will create a new dataframe and you need to save it otherwise it will just compute and print the result). Basically with your current code you're trying to index your initial dataframe retail which causes the error.

You can modify your code as follows:

import numpy as np
import pandas as pd


retail = pd.read_csv('online_retail2.csv')

retail_aggregated = retail.groupby(['Country','Description'])[['Quantity','Price']].agg([np.mean,max])

Then you can index your aggregated dataframe as you want:

retail_aggregated.loc[('Australia','DOLLY GIRL BEAKER'),('Quantity','mean')]

Edit : add a full working example

import numpy as np
import pandas as pd
import random
random.seed(123)
np.random.seed(123)


# Here I generate a random dataframe
retail = pd.DataFrame({
    "Country": [random.choice(["Australia", "West Indies"]) for _ in range(100)],
    "Description": [random.choice([
        "DOLLY GIRL BEAKER", "DOLLY GIRL BEAKER", "COLOUR SPACEBOY PEN", "VINTAGE BEAD PINK SCARF", "WOODLAND PARTY BAG + STICKER SET"
    ]) for _ in range(100)],
    "Quantity": np.random.randint(1, 10, 100),
    "Price": np.random.randint(1, 100, 100),
})

# Then I groupby and compute aggregate

retail_gp = retail.groupby(['Country','Description'])[['Quantity','Price']].agg([np.mean,max])
retail_gp.loc[('Australia','DOLLY GIRL BEAKER'),('Quantity','mean')]

Output:

4.894736842105263

KeyError: "None of [Index(['...', '...'], dtype='object')] are in the [index]"

Question

1 answers

solution1
0 2022-11-17 09:57:52

KeyError: "None of [Index(['...', '...'], dtype='object')] are in the [index]"

Question

1 answers

solution1 0 2022-11-17 09:57:52

solution1
0 2022-11-17 09:57:52