簡體   English   中英

將嵌套的JSON轉換為CSV或表格

[英]Convert nested JSON to CSV or table

我知道這個問題已被問過很多次,但沒有一個答案滿足我的要求。 我想將任何嵌套的 JSON動態轉換為 CSV 文件或 Dataframe。一些示例示例是:

input : {"menu": {
    "header": "SVG Viewer",
    "items": [
        {"id": "Open"},
        {"id": "OpenNew", "label": "Open New"},
        null,
        {"id": "ZoomIn", "label": "Zoom In"},
        {"id": "ZoomOut", "label": "Zoom Out"},
        {"id": "OriginalView", "label": "Original View"},
        null,
        {"id": "Quality"},
        {"id": "Pause"},
        {"id": "Mute"},
        null,
        {"id": "Find", "label": "Find..."},
        {"id": "FindAgain", "label": "Find Again"},
        {"id": "Copy"},
        {"id": "CopyAgain", "label": "Copy Again"},
        {"id": "CopySVG", "label": "Copy SVG"},
        {"id": "ViewSVG", "label": "View SVG"},
        {"id": "ViewSource", "label": "View Source"},
        {"id": "SaveAs", "label": "Save As"},
        null,
        {"id": "Help"},
        {"id": "About", "label": "About Adobe CVG Viewer..."}
    ]
}}

Output: 在此處輸入圖像描述

input 2 : {"menu": {
  "id": "file",
  "value": "File",
  "popup": {
    "menuitem": [
      {"value": "New", "onclick": "CreateNewDoc()"},
      {"value": "Open", "onclick": "OpenDoc()"},
      {"value": "Close", "onclick": "CloseDoc()"}
    ]
  }
}}

Output 2: 在此處輸入圖像描述

到目前為止,我已經嘗試了下面的代碼,它工作正常但它將列表類型數據分解為列,但我希望它按行分解。

from pandas.io.json import json_normalize
import pandas as pd


def flatten_json(y):
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            
            for a in x:
                
                flatten(x[a], name + a + '.')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '.')
                i += 1
        else:
            
            out[str(name[:-1])] = str(x)

    flatten(y)
    return out
  
def start_explode(data):
    
  if type(data) is dict: 
    df = pd.DataFrame([flatten_json(data)])
  else:
    df = pd.DataFrame([flatten_json(x) for x in data])
  
  df = df.astype(str)
  return df

complex_json = {"menu": {
  "id": "file",
  "value": "File",
  "popup": {
    "menuitem": [
      {"value": "New", "onclick": "CreateNewDoc()"},
      {"value": "Open", "onclick": "OpenDoc()"},
      {"value": "Close", "onclick": "CloseDoc()"}
    ]
  }
}}
df = start_explode(complex_json['menu'])
display(df)

對於上述輸入之一,它給出如下所示的 output:

在此處輸入圖像描述

  • 處理嵌套 json 的標准技術
    1. json_normalize()
    2. explode()
    3. apply(pd.Series)
  • 最后進行一些清理,刪除不需要的行並將nan替換為空字符串
import json
js = """{"menu": {
    "header": "SVG Viewer",
    "items": [
        {"id": "Open"},
        {"id": "OpenNew", "label": "Open New"},
        null,
        {"id": "ZoomIn", "label": "Zoom In"},
        {"id": "ZoomOut", "label": "Zoom Out"},
        {"id": "OriginalView", "label": "Original View"},
        null,
        {"id": "Quality"},
        {"id": "Pause"},
        {"id": "Mute"},
        null,
        {"id": "Find", "label": "Find..."},
        {"id": "FindAgain", "label": "Find Again"},
        {"id": "Copy"},
        {"id": "CopyAgain", "label": "Copy Again"},
        {"id": "CopySVG", "label": "Copy SVG"},
        {"id": "ViewSVG", "label": "View SVG"},
        {"id": "ViewSource", "label": "View Source"},
        {"id": "SaveAs", "label": "Save As"},
        null,
        {"id": "Help"},
        {"id": "About", "label": "About Adobe CVG Viewer..."}
    ]
}}"""

df = pd.json_normalize(json.loads(js)).explode("menu.items").reset_index(drop=True)
df.drop(columns=["menu.items"]).join(df["menu.items"].apply(pd.Series)).dropna(subset=["id"]).fillna("")

菜單.header ID label
0 SVG 查看器 打開
1個 SVG 查看器 打開新的 打開新的
3個 SVG 查看器 放大 放大
4個 SVG 查看器 縮小 縮小
5個 SVG 查看器 原始視圖 原始視圖
7 SVG 查看器 質量
8個 SVG 查看器 暫停
9 SVG 查看器 沉默的
11 SVG 查看器 尋找 尋找...
12 SVG 查看器 再找 再找
13 SVG 查看器 復制
14 SVG 查看器 再次復制 再次復制
15 SVG 查看器 復制SVG 復制 SVG
16 SVG 查看器 查看SVG 查看 SVG
17 SVG 查看器 查看來源 查看源代碼
18 SVG 查看器 另存為 另存為
20 SVG 查看器 幫助
21 SVG 查看器 關於 關於 Adobe CVG 查看器...

公用事業 function

  • 如果您不想命名列,但要使用第一個列表列
  • 確定包含列表的第一列
  • explode()並將apply(pd.Series)應用於該列
  • 提供了擴展所有列表的選項
def normalize(js, expand_all=False):
    df = pd.json_normalize(json.loads(js) if type(js)==str else js)
    # get first column that contains lists
    col = df.applymap(type).astype(str).eq("<class 'list'>").all().idxmax()
    # explode list and expand embedded dictionaries
    df = df.explode(col).reset_index(drop=True)
    df = df.drop(columns=[col]).join(df[col].apply(pd.Series), rsuffix=f".{col}")
    # any lists left?
    if expand_all and df.applymap(type).astype(str).eq("<class 'list'>").any(axis=1).all():
        df = normalize(df.to_dict("records"))
    return df

js = """{ "id": "0001", "type": "donut", "name": "Cake", "ppu": 0.55, "batters": { "batter": [ { "id": "1001", "type": "Regular" }, { "id": "1002", "type": "Chocolate" }, { "id": "1003", "type": "Blueberry" }, { "id": "1004", "type": "Devil's Food" } ] }, "topping": [ { "id": "5001", "type": "None" }, { "id": "5002", "type": "Glazed" }, { "id": "5005", "type": "Sugar" } ] }"""

normalize(js, expand_all=True)

ID 類型 名稱 ppu id.topping 打字打頂 id.batters.batter 類型.batters.batter
0 0001 油炸圈餅 蛋糕 0.55 5001 沒有任何 1001 常規的
1個 0001 油炸圈餅 蛋糕 0.55 5001 沒有任何 1002 巧克力
2個 0001 油炸圈餅 蛋糕 0.55 5001 沒有任何 1003 藍莓
3個 0001 油炸圈餅 蛋糕 0.55 5001 沒有任何 1004 惡魔的食物
4個 0001 油炸圈餅 蛋糕 0.55 5002 釉面 1001 常規的
5個 0001 油炸圈餅 蛋糕 0.55 5002 釉面 1002 巧克力
6個 0001 油炸圈餅 蛋糕 0.55 5002 釉面 1003 藍莓
7 0001 油炸圈餅 蛋糕 0.55 5002 釉面 1004 惡魔的食物
8個 0001 油炸圈餅 蛋糕 0.55 5005 1001 常規的
9 0001 油炸圈餅 蛋糕 0.55 5005 1002 巧克力
10 0001 油炸圈餅 蛋糕 0.55 5005 1003 藍莓
11 0001 油炸圈餅 蛋糕 0.55 5005 1004 惡魔的食物

考慮每個列表獨立

def n2(js):
    df = pd.json_normalize(json.loads(js))
    # columns that contain lists
    cols = [i for i, c in df.applymap(type).astype(str).eq("<class 'list'>").all().iteritems() if c]
    # use list from first row
    return pd.concat(
        [df.drop(columns=cols)]
        + [pd.json_normalize(df.loc[0, c]).pipe(lambda d: d.rename(columns={c2: f"{c}.{c2}" for c2 in d.columns}))
            for c in cols],
        axis=1,
    ).fillna("")

在 python 中,您可以使用 pandas 來執行此操作,但它會為每一行重復 header 值,如下所示


代碼

output

你可以試試json_normalize

import pandas as pd
import json

data = json.loads("""{"menu": {
    "header": "SVG Viewer",
    "items": [
        {"id": "Open"},
        {"id": "OpenNew", "label": "Open New"},
        null,
        {"id": "ZoomIn", "label": "Zoom In"},
        {"id": "ZoomOut", "label": "Zoom Out"},
        {"id": "OriginalView", "label": "Original View"},
        null,
        {"id": "Quality"},
        {"id": "Pause"},
        {"id": "Mute"},
        null,
        {"id": "Find", "label": "Find..."},
        {"id": "FindAgain", "label": "Find Again"},
        {"id": "Copy"},
        {"id": "CopyAgain", "label": "Copy Again"},
        {"id": "CopySVG", "label": "Copy SVG"},
        {"id": "ViewSVG", "label": "View SVG"},
        {"id": "ViewSource", "label": "View Source"},
        {"id": "SaveAs", "label": "Save As"},
        null,
        {"id": "Help"},
        {"id": "About", "label": "About Adobe CVG Viewer..."}
    ]
}}""")

# remove null
data['menu']['items'] = [i for i in data['menu']['items'] if i is not None]

pd.json_normalize(data['menu'], record_path=['items'], meta=['header'], record_prefix='items_')

#   items_id    items_label
# header        
# SVG Viewer    Open    NaN
# SVG Viewer    OpenNew Open New
# SVG Viewer    ZoomIn  Zoom In
# SVG Viewer    ZoomOut Zoom Out
# SVG Viewer    OriginalView    Original View
# SVG Viewer    Quality NaN
# SVG Viewer    Pause   NaN
# SVG Viewer    Mute    NaN
# SVG Viewer    Find    Find...
# SVG Viewer    FindAgain   Find Again
# SVG Viewer    Copy    NaN
# SVG Viewer    CopyAgain   Copy Again
# SVG Viewer    CopySVG Copy SVG
# SVG Viewer    ViewSVG View SVG
# SVG Viewer    ViewSource  View Source
# SVG Viewer    SaveAs  Save As
# SVG Viewer    Help    NaN
# SVG Viewer    About   About Adobe CVG Viewer...

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM