[英]Flatten super nested JSON to Pandas Dataframe
I have this super nested json file which needs to be in a flat form.我有这个超级嵌套的 json 文件,它需要采用平面形式。 Previously I had a similar problem for XML which i solved with the below simple code.
以前我对 XML 有类似的问题,我用下面的简单代码解决了这个问题。
df = pdx.read_xml('C:\\python_script\\temp1\\'+file,encoding='utf-8')
df = pdx.fully_flatten(df)
df = df.pipe(flatten)
Looking for a similar simple code to do the work.寻找类似的简单代码来完成这项工作。
Here is the data.这是数据。 https://www.donneesquebec.ca/recherche/dataset/d23b2e02-085d-43e5-9e6e-e1d558ebfdd5/resource/eb4d7620-6aa3-4850-aab6-a0fbe82f2dc1/download/hebdo_20211227_20220102.json
https://www.donneesquebec.ca/recherche/dataset/d23b2e02-085d-43e5-9e6e-e1d558ebfdd5/resource/eb4d7620-6aa3-4850-aab6-a0fbe82f2dc1/download/hebdo_20211227_20220102.json
Any help will be appreciated.任何帮助将不胜感激。 :)
:)
You can use json_normalize()
which is pretty effective.您可以使用非常有效的
json_normalize()
。 This article explains it very well for different scenarios: https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd这篇文章很好的解释了不同的场景: https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd
Here some examples.这里有一些例子。
some_dict = {
'key1': 'value1',
'key2': 'value2',
'key3': 123,
}
df = pd.json_normalize(some_dict)
WIth multiple levels, if you don't want all of the levels, you can use: pd.json_normalize(data, max_level=1)
对于多个级别,如果您不想要所有级别,可以使用:
pd.json_normalize(data, max_level=1)
With nested lists you can use meta
to specify a list of to include:使用嵌套列表,您可以使用
meta
指定要包含的列表:
json_object = {
'key1': 'value1',
'key2': 'value2',
'key3': {
'key3_1': 'value3',
'key3_2': {
'key3_2_1': {
'admission': 'value4',
'general': 'value5'
},
'key3_3': 'value6',
}
},
'key4': [
{ 'key4_1': 'value7' },
{ 'key4_2': 'value8' },
{ 'key4_3': 'value9' }
],
}
# you can do:
pd.json_normalize(
json_object,
record_path =['key4'],
meta=['key1', ['key3', 'key3_2', 'key3_3']],
)
If you have lists where not all keys are always present, you can use errors='ignore'
.如果您的列表并非始终存在所有键,则可以使用
errors='ignore'
。
pd.json_normalize(
json_object,
record_path =['key4'],
meta=['key1', ['key3', 'key3_2', 'key3_3']],
errors='ignore'
)
By default, nested values will be separated with .
默认情况下,嵌套值将用
.
, you can change this with sep=''
: ,你可以用
sep=''
改变它:
pd.json_normalize(
json_object,
record_path =['key4'],
meta=['key1', ['key3', 'key3_2', 'key3_3']],
errors='ignore',
sep='-'
)
It depends on where you get your JSON data, from local file or URL.这取决于您从本地文件或 URL 获取 JSON 数据的位置。 For local file:
对于本地文件:
import json
# load data using Python JSON module
with open('data/simple.json','r') as f:
data = json.loads(f.read())
# Flattening JSON data
pd.json_normalize(data)
For URLs:对于网址:
import requestsURL = 'http://raw.githubusercontent.com/BindiChen/machine-learning/master/data-analysis/027-pandas-convert-json/data/simple.json'
data = json.loads(requests.get(URL).text)# Flattening JSON data
pd.json_normalize(data)
After spending almost 2 days, here is the most simplest solution i could have.花了将近 2 天后,这是我能想到的最简单的解决方案。
with open('D:\\Json Data.json') as json_data:
data = json.load(json_data)
dic_flattened = [flatten(d) for d in data['releases']]
df = pd.DataFrame(dic_flattened)
It goes to the lowest level and make separated columns.它进入最低级别并制作单独的列。 The below article helped me.
下面的文章帮助了我。 https://pypi.org/project/flatten-json/
https://pypi.org/project/flatten-json/
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.