简体   繁体   中英

Convert a three-level hierarchical data into a specific json format

I have a three-level hierarchical dataset, like below:

pd.DataFrame({'level1': ['A', 'A', 'A', 'A', 'B'],
               'level2': ['A1', 'A1', 'A2', 'A2', 'B1' ],
                'level3': ['a', 'b', 'c', 'd', 'e'],
                'value': [44, 125, 787, 99, 111],
                'pctChg': [0.3, -0.9, -10.0, 12, -0.2]})


  level1    level2  level3  value   pctChg
0   A       A1      a       44       0.3
1   A       A1      b       125     -0.9
2   A       A2      c       787     -10.0
3   A       A2      d       99      12.0
4   B       B1      e       111     -0.2

For a specific level1 category like A , there are level2 category like A1, A2 . under levels two, there are many level3 categories. For example, 'a' and 'b' are under 'A1', 'c' and 'd' are under A2. This data is just an example. For each combination, there are value and percent info (percentage change from last month).

I need to tranform this data into a nested json data. It needs to be the format like below:

{
    name: “root”,
    value: 1166,
    pctChg: xx%,
    children: [
    {
        name: 'A',
        value: 956,
        pctChg: 'xx%'',
        children: [{
            name: 'A1',
            value: 169,
            pctChg: 'xx%'',
            children: [{name: 'a', value: 44, pctChg: '30%'},
                       {name:'b', value:125, pctChg: '-90%'},
                       {name:'c', value:787, pctChg: '-10%'}           
            ]
        },  .....]
    },
.....…
    ]
    }

We also need to aggregate the value for a level from all childrens at one level down. Value can be aggragated obviously. One tricky part is percent. We might not want to simply aggregate the percentage.

This looks like a pretty difficult task. Not like some simple nested json data. I'm not sure how I can approach that. Appreciate if anyone can help. Thanks a lot in advance.

The first step is to reformat pctChg column to percentage string:

df.pctChg = (df.pctChg * 100).astype(int).astype(str) + '%'

(I assumed multiply by 100 formula).

Then define 2 functions computing children of the second and first level:

def chld2(grp):
    return grp.rename(columns={'level3': 'name'}).groupby('level2')\
        .apply(lambda grp: pd.Series({'name': grp.iloc[0,1], 'value': grp.value.sum(),
        'pctChg': 'xx%', 'children': grp[['name', 'value', 'pctChg']].to_dict('r') }))\
        .to_dict('r')

def chld1(df):
    return df.groupby('level1').apply(lambda grp: pd.Series({
        'name': grp.iloc[0,0], 'value': grp.value.sum(), 'pctChg': 'xx%',
        'children': chld2(grp)})).to_dict('r')

And to generate the result, run:

pd.Series({'name': 'root', 'value': df.value.sum(), 'pctChg': 'xx%',
    'children': chld1(df)}).to_json()

The result (with manually added indentation for readability) is:

{ "name":"root", "value":1166, "pctChg":"xx%",
  "children":[
    { "name":"A", "value":1055, "pctChg":"xx%",
      "children":[
        { "name":"A1", "value":169, "pctChg":"xx%",
          "children":[
            {"name":"a", "value":44, "pctChg":"30%"},
            {"name":"b", "value":125, "pctChg":"-90%"}
          ]
        },
        { "name":"A2", "value":886, "pctChg":"xx%",
          "children":[
            {"name":"c", "value":787, "pctChg":"-1000%"},
            {"name":"d", "value":99, "pctChg":"1200%"}
          ]
        }
      ]
    },
    { "name":"B", "value":111, "pctChg":"xx%",
      "children":[
        { "name":"B1", "value":111, "pctChg":"xx%",
          "children":[
            {"name":"e", "value":111, "pctChg":"-20%"}
          ]
        }
      ]
    }
  ]
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM