简体   繁体   English

在 Python 中提取嵌套的 json/list

[英]extract nested json/list in Python

I have the following json / list structure in Python:我在 Python 中有以下 json / 列表结构:

    {
        u'week': 45,
        u'value': 
        {
            u'team': u'accounts', 
            u'KPI': 4, 
            u'Mgr': 1, 
            u'change': 0, 
            u'risk': 1000, 
            u'subGroups': [
                {
                    u'team': u'HR', 
                    u'KPI': 4, 
                    u'Mgr': 1, 
                    u'change': 0, 
                    u'risk': 2000, 
                    u'subGroups': [
                        {
                            u'team': u'Marketing', 
                            u'KPI': 4, 
                            u'Mgr': 1, 
                            u'change': 0, 
                            u'risk': 3000, 
                            u'subGroups': []
                        }
                    ]
                }
            ]
        }
    },
    {
        u'week': 44, 
        u'value': {
            u'team': u'accounts', 
            u'KPI': 4, 
            u'Mgr': 1, 
            u'change': 0, 
            u'risk': 4000, 
            u'subGroups': [
                {
                    u'team': u'HR', 
                    u'KPI': 4, 
                    u'Mgr': 1, 
                    u'change': 0, 
                    u'risk': 5000, 
                    u'subGroups': [
                        {
                            u'team': u'Marketing', 
                            u'KPI': 4, 
                            u'Mgr': 1, 
                            u'change': 0, 
                            u'risk': 6000, 
                            u'subGroups': []
                        }
                    ]
                }
            ]
        }
    },
    {
        u'week': 34, 
        u'value': {
            u'team': u'accounts', 
            u'KPI': 29, 
            u'Mgr': 1, 
            u'change': 0, 
            u'risk': 20000, 
            u'subGroups': [
                {
                    u'team': u'HR', 
                    u'KPI': 29, 
                    u'Mgr': 1, 
                    u'change': 0, 
                    u'risk': 20000, 
                    u'subGroups': [
                        {
                            u'team': u'Marketing', 
                            u'KPI': 29, 
                            u'Mgr': 1, 
                            u'change': 0, 
                            u'risk': 20000, 
                            u'subGroups': []
                        }
                    ]
                }
            ]
        }
    }
]

And I need to extract some values to create the following我需要提取一些值来创建以下

[
    {
        'team':'accounts', 
        risk : [
            1000,
            4000,
            20000
        ]
    },
    {
        'team': 'HR', 
        'risks'[
            2000,
            5000,
            2000
        ]
        },
    {
        'team' : 'Marketing', 
        risk : [
            3000,
            6000,
            2000
        ]
    }
]

In practice there could be any number of weeks and any number of levels of subgroups.在实践中,可以有任意数量的周和任意数量的子组级别。 Also, because of Docker container restrictions I need to use the standard Python 2 libraries only.此外,由于 Docker 容器限制,我只需要使用标准 Python 2 库。

I've been tying myself in knots trying to get this working so any help would be appreciated, thanks.我一直在努力解决这个问题,所以任何帮助都将不胜感激,谢谢。

You could use a function that will flaten out the nested json, and then reconstruct that.您可以使用 function 将嵌套的 json 展平,然后重建它。 Here I threw it into a table and then you can just slice and dice it any way you want:在这里,我把它扔到一张桌子上,然后你可以随意切片和切块:

import pandas as pd
import re


data = [{u'week': 45, u'value': {u'team': u'accounts', u'KPI': 4, u'Mgr': 1, u'change': 0, u'risk': 1000, u'subGroups': [{u'team': u'HR', u'KPI': 4, u'Mgr': 1, u'change': 0, u'risk': 2000, u'subGroups': [{u'team': u'Marketing', u'KPI': 4, u'Mgr': 1, u'change': 0, u'risk': 3000, u'subGroups': []}]}]}},
{u'week': 44, u'value': {u'team': u'accounts', u'KPI': 4, u'Mgr': 1, u'change': 0, u'risk': 4000, u'subGroups': [{u'team': u'HR', u'KPI': 4, u'Mgr': 1, u'change': 0, u'risk': 5000, u'subGroups': [{u'team': u'Marketing', u'KPI': 4, u'Mgr': 1, u'change': 0, u'risk': 6000, u'subGroups': []}]}]}},
{u'week': 34, u'value': {u'team': u'accounts', u'KPI': 29, u'Mgr': 1, u'change': 0, u'risk': 20000, u'subGroups': [{u'team': u'HR', u'KPI': 29, u'Mgr': 1, u'change': 0, u'risk': 20000, u'subGroups': [{u'team': u'Marketing', u'KPI': 29, u'Mgr': 1, u'change': 0, u'risk': 20000, u'subGroups': []}]}]}}]


def flatten_json(y):
    out = {}
    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x
    flatten(y)
    return out


flat = flatten_json(data)
columns_list = list(flat.keys())
rows = {}
for item in columns_list:

    row_idx = re.findall(r'(\d+)\_', item )[0]

    column = re.findall(r'\d+\_(.*)', item )[0]

    row_idx = int(row_idx)
    value = flat[item]

    if row_idx in rows:
        rows[row_idx][column] = value
    else:
        rows[row_idx] = {}
        rows[row_idx][column] = value

results = pd.DataFrame()       
for idx, row in rows.items():
    results = results.append(pd.DataFrame(row, index=[idx]), sort=True)

Output: Output:

print (results.to_string())
   value_KPI  value_Mgr  value_change  value_risk  value_subGroups_0_KPI  value_subGroups_0_Mgr  value_subGroups_0_change  value_subGroups_0_risk  value_subGroups_0_subGroups_0_KPI  value_subGroups_0_subGroups_0_Mgr  value_subGroups_0_subGroups_0_change  value_subGroups_0_subGroups_0_risk value_subGroups_0_subGroups_0_team value_subGroups_0_team value_team  week
0          4          1             0        1000                      4                      1                         0                    2000                                  4                                  1                                     0                                3000                          Marketing                     HR   accounts    45
1          4          1             0        4000                      4                      1                         0                    5000                                  4                                  1                                     0                                6000                          Marketing                     HR   accounts    44
2         29          1             0       20000                     29                      1                         0                   20000                                 29                                  1                                     0                               20000                          Marketing                     HR   accounts    34

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM