简体   繁体   English

展平超级嵌套 JSON 到 Pandas Dataframe

[英]Flatten super nested JSON to Pandas Dataframe

I have this super nested json file which needs to be in a flat form.我有这个超级嵌套的 json 文件,它需要采用平面形式。 Previously I had a similar problem for XML which i solved with the below simple code.以前我对 XML 有类似的问题,我用下面的简单代码解决了这个问题。

df = pdx.read_xml('C:\\python_script\\temp1\\'+file,encoding='utf-8')
df = pdx.fully_flatten(df)
df = df.pipe(flatten)

Looking for a similar simple code to do the work.寻找类似的简单代码来完成这项工作。

Here is the data.这是数据。 https://www.donneesquebec.ca/recherche/dataset/d23b2e02-085d-43e5-9e6e-e1d558ebfdd5/resource/eb4d7620-6aa3-4850-aab6-a0fbe82f2dc1/download/hebdo_20211227_20220102.json https://www.donneesquebec.ca/recherche/dataset/d23b2e02-085d-43e5-9e6e-e1d558ebfdd5/resource/eb4d7620-6aa3-4850-aab6-a0fbe82f2dc1/download/hebdo_20211227_20220102.json

Any help will be appreciated.任何帮助将不胜感激。 :) :)

You can use json_normalize() which is pretty effective.您可以使用非常有效的json_normalize() This article explains it very well for different scenarios: https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd这篇文章很好的解释了不同的场景: https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd

Here some examples.这里有一些例子。

some_dict = {
    'key1': 'value1',
    'key2': 'value2',
    'key3': 123,
}
df = pd.json_normalize(some_dict)

WIth multiple levels, if you don't want all of the levels, you can use: pd.json_normalize(data, max_level=1)对于多个级别,如果您不想要所有级别,可以使用: pd.json_normalize(data, max_level=1)

With nested lists you can use meta to specify a list of to include:使用嵌套列表,您可以使用meta指定要包含的列表:

json_object = {
    'key1': 'value1',
    'key2': 'value2',

    'key3': {
        'key3_1': 'value3',
        'key3_2': {
          'key3_2_1': {
              'admission': 'value4',
              'general': 'value5'
          },
          'key3_3': 'value6',
      }
    },
    'key4': [
      { 'key4_1': 'value7' },
      { 'key4_2': 'value8' },
      { 'key4_3': 'value9' }
    ],
}

# you can do:
pd.json_normalize(
    json_object, 
    record_path =['key4'],
    meta=['key1', ['key3', 'key3_2', 'key3_3']],
)

If you have lists where not all keys are always present, you can use errors='ignore' .如果您的列表并非始终存在所有键,则可以使用errors='ignore'

pd.json_normalize(
    json_object, 
    record_path =['key4'],
    meta=['key1', ['key3', 'key3_2', 'key3_3']],
    errors='ignore'
)

By default, nested values will be separated with .默认情况下,嵌套值将用. , you can change this with sep='' : ,你可以用sep=''改变它:

pd.json_normalize(
    json_object, 
    record_path =['key4'],
    meta=['key1', ['key3', 'key3_2', 'key3_3']],
    errors='ignore',
    sep='-'
)

It depends on where you get your JSON data, from local file or URL.这取决于您从本地文件或 URL 获取 JSON 数据的位置。 For local file:对于本地文件:

import json
# load data using Python JSON module
with open('data/simple.json','r') as f:
    data = json.loads(f.read())
    
# Flattening JSON data
pd.json_normalize(data)

For URLs:对于网址:

import requestsURL = 'http://raw.githubusercontent.com/BindiChen/machine-learning/master/data-analysis/027-pandas-convert-json/data/simple.json'
data = json.loads(requests.get(URL).text)# Flattening JSON data
pd.json_normalize(data)

After spending almost 2 days, here is the most simplest solution i could have.花了将近 2 天后,这是我能想到的最简单的解决方案。

with open('D:\\Json Data.json') as json_data:
    data = json.load(json_data)
dic_flattened = [flatten(d) for d in data['releases']]     
df = pd.DataFrame(dic_flattened)

It goes to the lowest level and make separated columns.它进入最低级别并制作单独的列。 The below article helped me.下面的文章帮助了我。 https://pypi.org/project/flatten-json/ https://pypi.org/project/flatten-json/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM