簡體   English   中英

將嵌套的 JSON 轉換為 Pandas Dataframe(以 JSON 為例)

[英]Convert Nested JSON to Pandas Dataframe (with JSON example)

我有一個 JSON blob,如下所示:

    {'status': 'OK',
 'data-availability': 'available',
 'data': [{'page': 1, 'pages': 1, 'total': 7},
  [{'domain_id': '101',
    'domain_name': 'Province1',
    'domain_url': 'https://province1.com'},
   {'domain_id': '102',
    'domain_name': 'Province2',
    'domain_url': 'https://province2.com'},
   {'domain_id': '103',
    'domain_name': 'Province3',
    'domain_url': 'https://province3.com'},
   {'domain_id': '104',
    'domain_name': 'Province4',
    'domain_url': 'https://province4.com'},
   {'domain_id': '105',
    'domain_name': 'Province5',
    'domain_url': 'https://province5.com'},
   {'domain_id': '106',
    'domain_name': 'Province6',
    'domain_url': 'https://province6.com'},
   {'domain_id': '107',
    'domain_name': 'Province7',
    'domain_url': 'https://province7.com'}]]}

我想要的是將其規范化為 Pandas DataFrame 哪些列由 domain_id、domain_name 和 domain_url 組成。

我怎樣才能做到這一點?

重復追加一個 dataframe 很慢 相反,將所有內容收集到字典中,然后調用.from_dict()

from pandas import pd

result = defaultdict(list)
for entry in data['data'][1]:
    for key, value in entry.items():
        result[key].append(value)

print(pd.DataFrame.from_dict(result))

這輸出:

  domain_id domain_name             domain_url
0       101   Province1  https://province1.com
1       102   Province2  https://province2.com
2       103   Province3  https://province3.com
3       104   Province4  https://province4.com
4       105   Province5  https://province5.com
5       106   Province6  https://province6.com
6       107   Province7  https://province7.com

這完成了工作,

data = json.loads(test)["data"][-1]
df = pd.DataFrame()

for d in data:
  temp_df = pd.DataFrame([data[0]])
  df = pd.concat([df, temp_df])

您可以使用pd.json_normalize()

raw_data = [{'domain_id': '101',
    'domain_name': 'Province1',
    'domain_url': 'https://province1.com'},
   {'domain_id': '102',
    'domain_name': 'Province2',
    'domain_url': 'https://province2.com'},
   {'domain_id': '103',
    'domain_name': 'Province3',
    'domain_url': 'https://province3.com'},
   {'domain_id': '104',
    'domain_name': 'Province4',
    'domain_url': 'https://province4.com'},
   {'domain_id': '105',
    'domain_name': 'Province5',
    'domain_url': 'https://province5.com'},
   {'domain_id': '106',
    'domain_name': 'Province6',
    'domain_url': 'https://province6.com'},
   {'domain_id': '107',
    'domain_name': 'Province7',
    'domain_url': 'https://province7.com'}]

# store data as df
df = pd.DataFrame({'raw':raw_data})

# split dict into columns with keys as column names
df_json = pd.json_normalize(df['raw'])

# concat dfs
df = pd.concat([df, df_json], axis=1)

# display
display(df)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM