简体   繁体   中英

Pandas: transforming dataframe to nested dictionary

I have this dataframe:

Month_Year  City_Name   Chain_Name Product_Name Product_Price
11-2021     London      Aldi       Pasta        2.33
11-2021     Bristol     Spar       Bananas      1.45
10-2021     London      Tesco      Olives       4.12
10-2021     Cardiff     Spar       Pasta        2.25

This dataframe will be displayed in nested collapsible which will expand in the following order:

Date_month:
    City_Name:
         Shop_name:
             Product_name : Price

Theefore, I want to transform the dataframe to the following structure:

{10-2021:
    {London:
        {Aldi:
            {Pasta:2.33}}},
    {Bristol:{
        {Spar:
            {Bananas:1.45}}}}

Essentially I want to groupby every element recursively, starting by purchase month. The closest thing I've found is the solution in this answer , but as dictionary doesn't take duplicate values as I might have duplicate values in every column. Other than that, I am not sure if using a dictionary is the optimal answer for this problem.

My best guess is I have to use other DS type such as some type of tree, but couldn't find any way to do that.

Any advises?

You can group your dataframe by all columns except price, then create your dictionaries in a loop:

# if more than one price for one product in a chain, then calculate mean:
grouped_df = df.groupby(['Month_Year', 'City_Name', 'Chain_Name', 'Product_Name']).agg('mean')

result = dict()
nested_dict = dict()

for index, value in grouped_df.itertuples():
    for i, key in enumerate(index):
        if i == 0:
            if not key in result:
                result[key] = {}
            nested_dict = result[key]
        elif i == len(index) - 1:
            nested_dict[key] = value
        else:
            if not key in nested_dict:
                nested_dict[key] = {}
            nested_dict = nested_dict[key]

print(json.dumps(result, indent=4))

Changing your df to show nested dict and mean calculation to:

  Month_Year City_Name Chain_Name Product_Name  Product_Price
0    11-2021    London       Aldi        Pasta           2.33
1    11-2021    London       Aldi        Pasta           2.35
2    11-2021    London       Aldi       Olives           3.99
3    11-2021   Bristol       Spar      Bananas           1.45
4    10-2021    London      Tesco       Olives           4.12
5    10-2021   Cardiff       Spar        Pasta           2.25

You get the output:

{
    "10-2021": {
        "Cardiff": {
            "Spar": {
                "Pasta": 2.25
            }
        },
        "London": {
            "Tesco": {
                "Olives": 4.12
            }
        }
    },
    "11-2021": {
        "Bristol": {
            "Spar": {
                "Bananas": 1.45
            }
        },
        "London": {
            "Aldi": {
                "Olives": 3.99,
                "Pasta": 2.34
            }
        }
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM