简体   繁体   中英

Pandas multi-index/header Dataframe to nested dictionary

I would like to understand if I can convert multi-index and multi-header DataFrame to a nested dict.

I have the following DataFrame:

                                                             revenue                           taxes
                                                                 sum       max      min         mean       sum
company              type
-                    -                                        15.00     15.00    15.00         0.00      0.00
Facebook             -                                      1259.79    218.50  -120.00         5.17    321.00
                     City TOT                                  0.00      0.00     0.00         4.00      4.00
                     Country TOT                               0.00      0.00     0.00         4.00      4.00
                     Sales                                     0.00      0.00     0.00         9.25     18.50
Google               %tax_1                                    0.00      0.00     0.00         0.89      3.58
                     -                                      3738.36   2643.08  -100.00        96.23  26369.57
                     City TOT                                  0.00      0.00     0.00         3.55     95.99
                     Country TOT                               0.00      0.00     0.00        23.25    628.00
...

df = pd.DataFrame.from_dict({('-', '-'): {('revenue', 'sum'): 15.0,
  ('revenue', 'max'): 15.0,
  ('revenue', 'min'): 15.0,
  ('taxes', 'mean'): 0.00,
  ('taxes', 'sum'): 0.0},
 ('Facebook', '-'): {('revenue', 'sum'): 1259.79,
  ('revenue', 'max'): 218.5,
  ('revenue', 'min'): -120.0,
  ('taxes', 'mean'): 5.17,
  ('taxes', 'sum'): 321.0},
 ('Facebook', 'City TOT'): {('revenue', 'sum'): 0.0,
  ('revenue', 'max'): 0.0,
  ('revenue', 'min'): 0.0,
  ('taxes', 'mean'): 4.00,
  ('taxes', 'sum'): 4.0},
 ('Facebook', 'Country TOT'): {('revenue', 'sum'): 0.0,
  ('revenue', 'max'): 0.0,
  ('revenue', 'min'): 0.0,
  ('taxes', 'mean'): 4.00,
  ('taxes', 'sum'): 4.0},
 ('Facebook', 'Sales'): {('revenue', 'sum'): 0.0,
  ('revenue', 'max'): 0.0,
  ('revenue', 'min'): 0.0,
  ('taxes', 'mean'): 9.25,
  ('taxes', 'sum'): 18.5},
 ('Google', '%tax_1'): {('revenue', 'sum'): 0.0,
  ('revenue', 'max'): 0.0,
  ('revenue', 'min'): 0.0,
  ('taxes', 'mean'): 0.89,
  ('taxes', 'sum'): 3.58},
 ('Google', '-'): {('revenue', 'sum'): 3738.36,
  ('revenue', 'max'): 2643.08,
  ('revenue', 'min'): -100.0,
  ('taxes', 'mean'): 96.23,
  ('taxes', 'sum'): 26369.57},
 ('Google', 'City TOT'): {('revenue', 'sum'): 0.0,
  ('revenue', 'max'): 0.0,
  ('revenue', 'min'): 0.0,
  ('taxes', 'mean'): 3.55,
  ('taxes', 'sum'): 95.99},
 ('Google', 'Country TOT'): {('revenue', 'sum'): 0.0,
  ('revenue', 'max'): 0.0,
  ('revenue', 'min'): 0.0,
  ('taxes', 'mean'): 23.25,
  ('taxes', 'sum'): 628.0},
 ('Google', 'PER GETS'): {('revenue', 'sum'): 0.0,
  ('revenue', 'max'): 0.0,
  ('revenue', 'min'): 0.0,
  ('taxes', 'mean'): 0.88,
  ('taxes', 'sum'): 2.64},
 ('Google', 'Sales'): {('revenue', 'sum'): 0.0,
  ('revenue', 'max'): 0.0,
  ('revenue', 'min'): 0.0,
  ('taxes', 'mean'): 12.61,
  ('taxes', 'sum'): 138.75},
 ('Google', 'VAT'): {('revenue', 'sum'): 0.0,
  ('revenue', 'max'): 0.0,
  ('revenue', 'min'): 0.0,
  ('taxes', 'mean'): 11.70,
  ('taxes', 'sum'): 1065.51},
 ('Google', 'per room_1'): {('revenue', 'sum'): 0.0,
  ('revenue', 'max'): 0.0,
  ('revenue', 'min'): 0.0,
  ('taxes', 'mean'): 5.00,
  ('taxes', 'sum'): 20.0},
 ('Google', 'tax on top'): {('revenue', 'sum'): 0.0,
  ('revenue', 'max'): 0.0,
  ('revenue', 'min'): 0.0,
  ('taxes', 'mean'): 8.68,
  ('taxes', 'sum'): 78.2},
 ('Google', 'tax per reserv'): {('revenue', 'sum'): 0.0,
  ('revenue', 'max'): 0.0,
  ('revenue', 'min'): 0.0,
  ('taxes', 'mean'): 20.00,
  ('taxes', 'sum'): 40.0}}, orient='index')
df.index.names = ['company', 'type']

The columns and index are MultiIndex:

MultiIndex([('revenue',  'sum'),
            ('revenue',  'max'),
            ('revenue',  'min'),
            ( 'taxes', 'mean'),
            ( 'taxes',  'sum')],)
MultiIndex([(                  '-',                                '-'),
            (           'Facebook',                                '-'),
            (           'Facebook',                         'City TOT'),
            (           'Facebook',                      'Country TOT'),
            (           'Facebook',                            'Sales'),
            (           'Google',                           '%tax_1'),
            ...

I have tried to use this method (based on Pandas multi index dataframe to nested dictionary ):

{level: df.xs(level).to_dict('index') for level in df.index.levels[0]}

However, this is what I get:

{'-': {('revenue', 'sum'): {'-': 15.0},
  ('revenue', 'max'): {'-': 15.0},
  ('revenue', 'min'): {'-': 15.0},
  ('taxes', 'mean'): {'-': 0.0)},
  ('taxes', 'sum'): {'-': 0.0}},
 'Facebook': {('revenue', 'sum'): {'-': 1259.79,
   'City TOT': 0.0,
   'Country TOT': 0.0,
   'Sales': 0.0},
  ('revenue', 'max'): {'-': 218.5,
   'City TOT': 0.0,
   'Country TOT': 0.0,
   'Sales': 0.0},
  ('revenue', 'min'): {'-': -120.0,
   'City TOT': 0.0,
   'Country TOT': 0.0,
   'Sales': 0.0},
  ...

What I expect to achieve instead is a nested dict on the columns (exactly as it is on the index), where the data gets nested per the indexes like this:

{
    '-': {
        '-':  {
            'revenue': {
                'sum': 15.0,
                'max': 15.0,
                'min': 15.0,
            },
            'taxes': {
                'mean': 0.00,
                'sum': 0.00,
            }
        },
    },
    'Facebook': {
        '-': {
            'revenue': {
                'sum': 1259.79,
                'max': 218.5,
                'min': -120.0
            }
            'taxes': ...
        },
        'City TOT': {
            'revenue': {
                'sum': 0.0,
                'max': 0.0,
                'min': 0.0
            }
            'taxes': ...
        },
        'Country TOT': {
            'revenue': {
                'sum': 0.0,
                'max': 0.0,
                'min': 0.0
            }
            'taxes': ...
        },
        'Sales': {
            'revenue': {
                'sum': 0.0,
                'max': 0.0,
                'min': 0.0
            }
            'taxes': ...
        },
    } 
}

Any way I can get around with this? Thanks!

In 2 steps:

  1. Pivot your dataframe and convert it to a flat dict.
  2. Convert tuple keys to a nested dict

Step 1

data = df.stack(level=[0, 1]).to_dict()
print(data)

# Output
{('-', '-', 'revenue', 'max'): 15.0,
 ('-', '-', 'revenue', 'min'): 15.0,
 ('-', '-', 'revenue', 'sum'): 15.0,
 ...
 ('Google', 'Country TOT', 'revenue', 'sum'): 0.0,
 ('Google', 'Country TOT', 'taxes', 'mean'): 23.25,
 ('Google', 'Country TOT', 'taxes', 'sum'): 628.0}

Step 2

d = {}
for t, v in data.items():
    e = d.setdefault(t[0], {})
    for k in t[1:-1]:
        e = e.setdefault(k, {})
    e[t[-1]] = v
print(d)

# Output
{'-': {'-': {'revenue': {'max': 15.0, 'min': 15.0, 'sum': 15.0},
   'taxes': {'mean': 0.0, 'sum': 0.0}}},
 'Facebook': {'-': {'revenue': {'max': 218.5, 'min': -120.0, 'sum': 1259.79},
   'taxes': {'mean': 5.17, 'sum': 321.0}},
  'City TOT': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
   'taxes': {'mean': 4.0, 'sum': 4.0}},
  'Country TOT': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
   'taxes': {'mean': 4.0, 'sum': 4.0}},
  'Sales': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
   'taxes': {'mean': 9.25, 'sum': 18.5}}},
 'Google': {'%tax_1': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
   'taxes': {'mean': 0.89, 'sum': 3.58}},
  '-': {'revenue': {'max': 2643.08, 'min': -100.0, 'sum': 3738.36},
   'taxes': {'mean': 96.23, 'sum': 26369.57}},
  'City TOT': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
   'taxes': {'mean': 3.55, 'sum': 95.99}},
  'Country TOT': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
   'taxes': {'mean': 23.25, 'sum': 628.0}}}}

Pretty printing:

import json
print(json.dumps(d, indent=4))

# Output
{
    "-": {
        "-": {
            "revenue": {
                "max": 15.0,
                "min": 15.0,
                "sum": 15.0
            },
            "taxes": {
                "mean": 0.0,
                "sum": 0.0
            }
        }
    },
    "Facebook": {
        "-": {
            "revenue": {
                "max": 218.5,
                "min": -120.0,
                "sum": 1259.79
            },
            "taxes": {
                "mean": 5.17,
                "sum": 321.0
            }
        },
        "City TOT": {
            "revenue": {
                "max": 0.0,
                "min": 0.0,
                "sum": 0.0
            },
            "taxes": {
                "mean": 4.0,
                "sum": 4.0
            }
        },
        "Country TOT": {
            "revenue": {
                "max": 0.0,
                "min": 0.0,
                "sum": 0.0
            },
            "taxes": {
                "mean": 4.0,
                "sum": 4.0
            }
        },
        "Sales": {
            "revenue": {
                "max": 0.0,
                "min": 0.0,
                "sum": 0.0
            },
            "taxes": {
                "mean": 9.25,
                "sum": 18.5
            }
        }
    },
    "Google": {
        "%tax_1": {
            "revenue": {
                "max": 0.0,
                "min": 0.0,
                "sum": 0.0
            },
            "taxes": {
                "mean": 0.89,
                "sum": 3.58
            }
        },
        "-": {
            "revenue": {
                "max": 2643.08,
                "min": -100.0,
                "sum": 3738.36
            },
            "taxes": {
                "mean": 96.23,
                "sum": 26369.57
            }
        },
        "City TOT": {
            "revenue": {
                "max": 0.0,
                "min": 0.0,
                "sum": 0.0
            },
            "taxes": {
                "mean": 3.55,
                "sum": 95.99
            }
        },
        "Country TOT": {
            "revenue": {
                "max": 0.0,
                "min": 0.0,
                "sum": 0.0
            },
            "taxes": {
                "mean": 23.25,
                "sum": 628.0
            }
        }
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM