简体   繁体   English

将三级分层数据转换成特定的json格式

[英]Convert a three-level hierarchical data into a specific json format

I have a three-level hierarchical dataset, like below:我有一个三级分层数据集,如下所示:

pd.DataFrame({'level1': ['A', 'A', 'A', 'A', 'B'],
               'level2': ['A1', 'A1', 'A2', 'A2', 'B1' ],
                'level3': ['a', 'b', 'c', 'd', 'e'],
                'value': [44, 125, 787, 99, 111],
                'pctChg': [0.3, -0.9, -10.0, 12, -0.2]})


  level1    level2  level3  value   pctChg
0   A       A1      a       44       0.3
1   A       A1      b       125     -0.9
2   A       A2      c       787     -10.0
3   A       A2      d       99      12.0
4   B       B1      e       111     -0.2

For a specific level1 category like A , there are level2 category like A1, A2 .对于像A这样的特定 level1 类别,有像A1, A2这样的 level2 类别。 under levels two, there are many level3 categories.在二级之下,有许多三级类别。 For example, 'a' and 'b' are under 'A1', 'c' and 'd' are under A2.例如,“a”和“b”在“A1”下,“c”和“d”在A2下。 This data is just an example.这个数据只是一个例子。 For each combination, there are value and percent info (percentage change from last month).对于每个组合,都有价值和百分比信息(与上个月相比的百分比变化)。

I need to tranform this data into a nested json data.我需要将此数据转换为嵌套的 json 数据。 It needs to be the format like below:它需要是如下格式:

{
    name: “root”,
    value: 1166,
    pctChg: xx%,
    children: [
    {
        name: 'A',
        value: 956,
        pctChg: 'xx%'',
        children: [{
            name: 'A1',
            value: 169,
            pctChg: 'xx%'',
            children: [{name: 'a', value: 44, pctChg: '30%'},
                       {name:'b', value:125, pctChg: '-90%'},
                       {name:'c', value:787, pctChg: '-10%'}           
            ]
        },  .....]
    },
.....…
    ]
    }

We also need to aggregate the value for a level from all childrens at one level down.我们还需要汇总一个级别下所有子级的值。 Value can be aggragated obviously.价值可以明显增加。 One tricky part is percent.一个棘手的部分是百分比。 We might not want to simply aggregate the percentage.我们可能不想简单地汇总百分比。

This looks like a pretty difficult task.这看起来是一项相当艰巨的任务。 Not like some simple nested json data.不像一些简单的嵌套 json 数据。 I'm not sure how I can approach that.我不确定我该如何处理。 Appreciate if anyone can help.感谢有人可以提供帮助。 Thanks a lot in advance.非常感谢。

The first step is to reformat pctChg column to percentage string:第一步是将pctChg列重新格式化为百分比字符串:

df.pctChg = (df.pctChg * 100).astype(int).astype(str) + '%'

(I assumed multiply by 100 formula). (我假设乘以 100公式)。

Then define 2 functions computing children of the second and first level:然后定义 2 个函数来计算第二级和第一级的子级:

def chld2(grp):
    return grp.rename(columns={'level3': 'name'}).groupby('level2')\
        .apply(lambda grp: pd.Series({'name': grp.iloc[0,1], 'value': grp.value.sum(),
        'pctChg': 'xx%', 'children': grp[['name', 'value', 'pctChg']].to_dict('r') }))\
        .to_dict('r')

def chld1(df):
    return df.groupby('level1').apply(lambda grp: pd.Series({
        'name': grp.iloc[0,0], 'value': grp.value.sum(), 'pctChg': 'xx%',
        'children': chld2(grp)})).to_dict('r')

And to generate the result, run:要生成结果,请运行:

pd.Series({'name': 'root', 'value': df.value.sum(), 'pctChg': 'xx%',
    'children': chld1(df)}).to_json()

The result (with manually added indentation for readability) is:结果(手动添加缩进以提高可读性)是:

{ "name":"root", "value":1166, "pctChg":"xx%",
  "children":[
    { "name":"A", "value":1055, "pctChg":"xx%",
      "children":[
        { "name":"A1", "value":169, "pctChg":"xx%",
          "children":[
            {"name":"a", "value":44, "pctChg":"30%"},
            {"name":"b", "value":125, "pctChg":"-90%"}
          ]
        },
        { "name":"A2", "value":886, "pctChg":"xx%",
          "children":[
            {"name":"c", "value":787, "pctChg":"-1000%"},
            {"name":"d", "value":99, "pctChg":"1200%"}
          ]
        }
      ]
    },
    { "name":"B", "value":111, "pctChg":"xx%",
      "children":[
        { "name":"B1", "value":111, "pctChg":"xx%",
          "children":[
            {"name":"e", "value":111, "pctChg":"-20%"}
          ]
        }
      ]
    }
  ]
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM