[英]Convert nested JSON to CSV or table
我知道這個問題已被問過很多次,但沒有一個答案滿足我的要求。 我想將任何嵌套的 JSON動態轉換為 CSV 文件或 Dataframe。一些示例示例是:
input : {"menu": {
"header": "SVG Viewer",
"items": [
{"id": "Open"},
{"id": "OpenNew", "label": "Open New"},
null,
{"id": "ZoomIn", "label": "Zoom In"},
{"id": "ZoomOut", "label": "Zoom Out"},
{"id": "OriginalView", "label": "Original View"},
null,
{"id": "Quality"},
{"id": "Pause"},
{"id": "Mute"},
null,
{"id": "Find", "label": "Find..."},
{"id": "FindAgain", "label": "Find Again"},
{"id": "Copy"},
{"id": "CopyAgain", "label": "Copy Again"},
{"id": "CopySVG", "label": "Copy SVG"},
{"id": "ViewSVG", "label": "View SVG"},
{"id": "ViewSource", "label": "View Source"},
{"id": "SaveAs", "label": "Save As"},
null,
{"id": "Help"},
{"id": "About", "label": "About Adobe CVG Viewer..."}
]
}}
input 2 : {"menu": {
"id": "file",
"value": "File",
"popup": {
"menuitem": [
{"value": "New", "onclick": "CreateNewDoc()"},
{"value": "Open", "onclick": "OpenDoc()"},
{"value": "Close", "onclick": "CloseDoc()"}
]
}
}}
到目前為止,我已經嘗試了下面的代碼,它工作正常但它將列表類型數據分解為列,但我希望它按行分解。
from pandas.io.json import json_normalize
import pandas as pd
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '.')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '.')
i += 1
else:
out[str(name[:-1])] = str(x)
flatten(y)
return out
def start_explode(data):
if type(data) is dict:
df = pd.DataFrame([flatten_json(data)])
else:
df = pd.DataFrame([flatten_json(x) for x in data])
df = df.astype(str)
return df
complex_json = {"menu": {
"id": "file",
"value": "File",
"popup": {
"menuitem": [
{"value": "New", "onclick": "CreateNewDoc()"},
{"value": "Open", "onclick": "OpenDoc()"},
{"value": "Close", "onclick": "CloseDoc()"}
]
}
}}
df = start_explode(complex_json['menu'])
display(df)
對於上述輸入之一,它給出如下所示的 output:
json_normalize()
explode()
apply(pd.Series)
nan
替換為空字符串import json
js = """{"menu": {
"header": "SVG Viewer",
"items": [
{"id": "Open"},
{"id": "OpenNew", "label": "Open New"},
null,
{"id": "ZoomIn", "label": "Zoom In"},
{"id": "ZoomOut", "label": "Zoom Out"},
{"id": "OriginalView", "label": "Original View"},
null,
{"id": "Quality"},
{"id": "Pause"},
{"id": "Mute"},
null,
{"id": "Find", "label": "Find..."},
{"id": "FindAgain", "label": "Find Again"},
{"id": "Copy"},
{"id": "CopyAgain", "label": "Copy Again"},
{"id": "CopySVG", "label": "Copy SVG"},
{"id": "ViewSVG", "label": "View SVG"},
{"id": "ViewSource", "label": "View Source"},
{"id": "SaveAs", "label": "Save As"},
null,
{"id": "Help"},
{"id": "About", "label": "About Adobe CVG Viewer..."}
]
}}"""
df = pd.json_normalize(json.loads(js)).explode("menu.items").reset_index(drop=True)
df.drop(columns=["menu.items"]).join(df["menu.items"].apply(pd.Series)).dropna(subset=["id"]).fillna("")
菜單.header | ID | label | |
---|---|---|---|
0 | SVG 查看器 | 打開 | |
1個 | SVG 查看器 | 打開新的 | 打開新的 |
3個 | SVG 查看器 | 放大 | 放大 |
4個 | SVG 查看器 | 縮小 | 縮小 |
5個 | SVG 查看器 | 原始視圖 | 原始視圖 |
7 | SVG 查看器 | 質量 | |
8個 | SVG 查看器 | 暫停 | |
9 | SVG 查看器 | 沉默的 | |
11 | SVG 查看器 | 尋找 | 尋找... |
12 | SVG 查看器 | 再找 | 再找 |
13 | SVG 查看器 | 復制 | |
14 | SVG 查看器 | 再次復制 | 再次復制 |
15 | SVG 查看器 | 復制SVG | 復制 SVG |
16 | SVG 查看器 | 查看SVG | 查看 SVG |
17 | SVG 查看器 | 查看來源 | 查看源代碼 |
18 | SVG 查看器 | 另存為 | 另存為 |
20 | SVG 查看器 | 幫助 | |
21 | SVG 查看器 | 關於 | 關於 Adobe CVG 查看器... |
explode()
並將apply(pd.Series)
應用於該列def normalize(js, expand_all=False):
df = pd.json_normalize(json.loads(js) if type(js)==str else js)
# get first column that contains lists
col = df.applymap(type).astype(str).eq("<class 'list'>").all().idxmax()
# explode list and expand embedded dictionaries
df = df.explode(col).reset_index(drop=True)
df = df.drop(columns=[col]).join(df[col].apply(pd.Series), rsuffix=f".{col}")
# any lists left?
if expand_all and df.applymap(type).astype(str).eq("<class 'list'>").any(axis=1).all():
df = normalize(df.to_dict("records"))
return df
js = """{ "id": "0001", "type": "donut", "name": "Cake", "ppu": 0.55, "batters": { "batter": [ { "id": "1001", "type": "Regular" }, { "id": "1002", "type": "Chocolate" }, { "id": "1003", "type": "Blueberry" }, { "id": "1004", "type": "Devil's Food" } ] }, "topping": [ { "id": "5001", "type": "None" }, { "id": "5002", "type": "Glazed" }, { "id": "5005", "type": "Sugar" } ] }"""
normalize(js, expand_all=True)
ID | 類型 | 名稱 | ppu | id.topping | 打字打頂 | id.batters.batter | 類型.batters.batter | |
---|---|---|---|---|---|---|---|---|
0 | 0001 | 油炸圈餅 | 蛋糕 | 0.55 | 5001 | 沒有任何 | 1001 | 常規的 |
1個 | 0001 | 油炸圈餅 | 蛋糕 | 0.55 | 5001 | 沒有任何 | 1002 | 巧克力 |
2個 | 0001 | 油炸圈餅 | 蛋糕 | 0.55 | 5001 | 沒有任何 | 1003 | 藍莓 |
3個 | 0001 | 油炸圈餅 | 蛋糕 | 0.55 | 5001 | 沒有任何 | 1004 | 惡魔的食物 |
4個 | 0001 | 油炸圈餅 | 蛋糕 | 0.55 | 5002 | 釉面 | 1001 | 常規的 |
5個 | 0001 | 油炸圈餅 | 蛋糕 | 0.55 | 5002 | 釉面 | 1002 | 巧克力 |
6個 | 0001 | 油炸圈餅 | 蛋糕 | 0.55 | 5002 | 釉面 | 1003 | 藍莓 |
7 | 0001 | 油炸圈餅 | 蛋糕 | 0.55 | 5002 | 釉面 | 1004 | 惡魔的食物 |
8個 | 0001 | 油炸圈餅 | 蛋糕 | 0.55 | 5005 | 糖 | 1001 | 常規的 |
9 | 0001 | 油炸圈餅 | 蛋糕 | 0.55 | 5005 | 糖 | 1002 | 巧克力 |
10 | 0001 | 油炸圈餅 | 蛋糕 | 0.55 | 5005 | 糖 | 1003 | 藍莓 |
11 | 0001 | 油炸圈餅 | 蛋糕 | 0.55 | 5005 | 糖 | 1004 | 惡魔的食物 |
def n2(js):
df = pd.json_normalize(json.loads(js))
# columns that contain lists
cols = [i for i, c in df.applymap(type).astype(str).eq("<class 'list'>").all().iteritems() if c]
# use list from first row
return pd.concat(
[df.drop(columns=cols)]
+ [pd.json_normalize(df.loc[0, c]).pipe(lambda d: d.rename(columns={c2: f"{c}.{c2}" for c2 in d.columns}))
for c in cols],
axis=1,
).fillna("")
你可以試試json_normalize
import pandas as pd
import json
data = json.loads("""{"menu": {
"header": "SVG Viewer",
"items": [
{"id": "Open"},
{"id": "OpenNew", "label": "Open New"},
null,
{"id": "ZoomIn", "label": "Zoom In"},
{"id": "ZoomOut", "label": "Zoom Out"},
{"id": "OriginalView", "label": "Original View"},
null,
{"id": "Quality"},
{"id": "Pause"},
{"id": "Mute"},
null,
{"id": "Find", "label": "Find..."},
{"id": "FindAgain", "label": "Find Again"},
{"id": "Copy"},
{"id": "CopyAgain", "label": "Copy Again"},
{"id": "CopySVG", "label": "Copy SVG"},
{"id": "ViewSVG", "label": "View SVG"},
{"id": "ViewSource", "label": "View Source"},
{"id": "SaveAs", "label": "Save As"},
null,
{"id": "Help"},
{"id": "About", "label": "About Adobe CVG Viewer..."}
]
}}""")
# remove null
data['menu']['items'] = [i for i in data['menu']['items'] if i is not None]
pd.json_normalize(data['menu'], record_path=['items'], meta=['header'], record_prefix='items_')
# items_id items_label
# header
# SVG Viewer Open NaN
# SVG Viewer OpenNew Open New
# SVG Viewer ZoomIn Zoom In
# SVG Viewer ZoomOut Zoom Out
# SVG Viewer OriginalView Original View
# SVG Viewer Quality NaN
# SVG Viewer Pause NaN
# SVG Viewer Mute NaN
# SVG Viewer Find Find...
# SVG Viewer FindAgain Find Again
# SVG Viewer Copy NaN
# SVG Viewer CopyAgain Copy Again
# SVG Viewer CopySVG Copy SVG
# SVG Viewer ViewSVG View SVG
# SVG Viewer ViewSource View Source
# SVG Viewer SaveAs Save As
# SVG Viewer Help NaN
# SVG Viewer About About Adobe CVG Viewer...
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.