json_normalize- 嵌套字典到 pandas DataFrame

Question

Need help on the below nested dictionary and I want to convert this to a pandas Data Frame需要以下嵌套字典的帮助，我想将其转换为 pandas 数据框

My JSON have the following instance of CPU data and comes with random occurrence:我的 JSON 有以下 CPU 数据实例并且随机出现：

Instance1 [{'datapoints': [{'statistic': 'Minimum', 'timestamp': '2021-08-31 06:50:00.000000', 'value': 59.03}, {'statistic': 'Minimum', 'timestamp': '2021-08-18 02:50:00.000000', 'value': 59.37}, {'statistic': 'Minimum', 'timestamp': '2021-08-24 16:50:00.000000', 'value': 58.84},, 'metric': 'VolumeIdleTime', 'unit': 'Seconds'}]

Instance2 [{'datapoints': [{'statistic': 'Minimum', 'timestamp': '2021-08-31 06:50:00.000000', 'value': 60}, {'statistic': 'Minimum', 'timestamp': '2021-08-18 02:50:00.000000', 'value': 55.45}, {'statistic': 'Minimum', 'timestamp': '2021-08-24 16:50:00.000000', 'value': 54.16}, {'statistic': 'Minimum', 'timestamp': '2021-08-06 07:50:00.000000', 'value': 50.03}, {'statistic': 'Minimum', 'timestamp': '2021-08-04 22:50:00.000000', 'value': 60}, {'statistic': 'Minimum', 'timestamp': '2021-08-26 01:50:00.000000', 'value': 60.34}, 'metric': 'VolumeIdleTime', 'unit': 'Seconds'}]

Instance3 [{'datapoints': [{'statistic': 'Minimum', 'timestamp': '2021-08-31 06:50:00.000000', 'value': 60}, {'statistic': 'Minimum', 'timestamp': '2021-08-18 02:50:00.000000', 'value': 38.12}, {'statistic': 'Minimum', 'timestamp': '2021-08-24 16:50:00.000000', 'value': 42.31}, {'statistic': 'Minimum', 'timestamp': '2021-08-06 07:50:00.000000', 'value': 45.22}, {'statistic': 'Minimum', 'timestamp': '2021-08-04 22:50:00.000000', 'value': 40.51}, {'statistic': 'Minimum', 'timestamp': '2021-08-26 01:50:00.000000', 'value': 34.35}, {'statistic': 'Minimum', 'timestamp': '2021-08-11 12:50:00.000000', 'value': 46.33},'metric': 'VolumeIdleTime', 'unit': 'Seconds'}]

And many more instance details to follow ( Close to 8K instance information )

Question: How to advance the below coding so it can fetch from all rows of every instance:问题：如何推进下面的编码，以便它可以从每个实例的所有行中获取：

data = [{'datapoints': [{'statistic': 'Minimum', 'timestamp': '2021-08-31 06:50:00.000000', 'value': 59.03},{'statistic': 'Minimum', 'timestamp': '2021-08-18 02:50:00.000000', 'value': 59.37}, {'statistic': 'Minimum', 'timestamp': '2021-08-24 16:50:00.000000', 'value': 58.84}],'metric': 'VolumeIdleTime', 'unit': 'Seconds'}]
df = pd.json_normalize(data, record_path="datapoints", meta=["metric", "unit"])
print(df)

Answer 1

below code will loop through all your instance and concat them all to one big dataframe:下面的代码将遍历所有实例并将它们全部连接到一个大 dataframe：

import pandas as pd

df = pd.json_normalize(instance1, record_path="datapoints", meta=["metric", "unit"])
for instance in your_remained_instances:
  df = pd.concat([df, pd.json_normalize(instance, record_path="datapoints", meta=["metric", "unit"])])

Answer 2

I don't really understand what the problem is, but if you method doesn't work for all rows, do this:我真的不明白问题是什么，但是如果您的方法不适用于所有行，请执行以下操作：

If your data looks like:如果您的数据如下所示：

data = [{'datapoints': [{'statistic': 'Minimum', 'timestamp': '2021-08-31 06:50:00.000000', 'value': 59.03},{'statistic': 'Minimum', 'timestamp': '2021-08-18 02:50:00.000000', 'value': 59.37}, {'statistic': 'Minimum', 'timestamp': '2021-08-24 16:50:00.000000', 'value': 58.84}],'metric': 'VolumeIdleTime', 'unit': 'Seconds'}]
df = pd.DataFrame(data)

that is:那是：

                                   datapoints          metric     unit
0  [{'statistic': 'Minimum', 'timestamp': '2021-0...  VolumeIdleTime  Seconds

You can define the following function:您可以定义以下 function：

def flatten_nested_json_df(df):
    df = df.reset_index()
    s = (df.applymap(type) == list).all()
    list_columns = s[s].index.tolist()
    
    s = (df.applymap(type) == dict).all()
    dict_columns = s[s].index.tolist()

    
    while len(list_columns) > 0 or len(dict_columns) > 0:
        new_columns = []

        for col in dict_columns:
            horiz_exploded = pd.json_normalize(df[col]).add_prefix(f'{col}.')
            horiz_exploded.index = df.index
            df = pd.concat([df, horiz_exploded], axis=1).drop(columns=[col])
            new_columns.extend(horiz_exploded.columns) # inplace

        for col in list_columns:
            #print(f"exploding: {col}")
            df = df.drop(columns=[col]).join(df[col].explode().to_frame())
            new_columns.append(col)

        s = (df[new_columns].applymap(type) == list).all()
        list_columns = s[s].index.tolist()

        s = (df[new_columns].applymap(type) == dict).all()
        dict_columns = s[s].index.tolist()
    return df

and apply it:并应用它：

flatten_nested_json_df(df)

which returns:返回：

index          metric     unit datapoints.statistic  \
0      0  VolumeIdleTime  Seconds              Minimum   
0      0  VolumeIdleTime  Seconds              Minimum   
0      0  VolumeIdleTime  Seconds              Minimum   

         datapoints.timestamp  datapoints.value  
0  2021-08-31 06:50:00.000000             59.03  
0  2021-08-18 02:50:00.000000             59.37  
0  2021-08-24 16:50:00.000000             58.84

json_normalize- 嵌套字典到 pandas DataFrame

问题描述

2 个解决方案

解决方案1
0 2021-11-24 07:57:44

解决方案2
0 2021-11-24 07:58:06

json_normalize- 嵌套字典到 pandas DataFrame

问题描述

2 个解决方案

解决方案1 0 2021-11-24 07:57:44

解决方案2 0 2021-11-24 07:58:06

解决方案1
0 2021-11-24 07:57:44

解决方案2
0 2021-11-24 07:58:06