簡體   English   中英

Python Pandas:如何從 JSON 中提取數據,並將其轉換為數據幀?

[英]Python Pandas: How to extract data from JSON, and turn it into data frame?

我有 JSON 數據,例如

{'Author': [{'name': 'John', 'Agency': {'Marketing': [{'name': 'SD_SM_14'}], 'Media': [{'codeX': 's_wse@2'}]}}]}

我想提取三列(作者、營銷和媒體)並將其轉換為如下數據:

Author  Marketing  Media
John    SD_SM_14   s_wse@2

感謝您提前提供任何幫助!

也許您應該明確展平您發布的嵌套 JSON 數據。

JSON 結構:

{
  "Author": [
    {
      "name": "John",
      "Agency": {
        "Marketing": [
          {
            "name": "SD_SM_14"
          }
        ],
        "Media": [
          {
            "codeX": "s_wse@2"
          }
        ]
      }
    }
  ]
}

你想要什么:

Author  Marketing  Media
John    SD_SM_14   s_wse@2

這是代碼:

import pandas as pd
from typing import Dict

def flatten(data: Dict):
    for key, value in data.items():
        for res in value:
            # assume that there is only one key in `res`
            yield key, next(iter(res.values()))

def func(data: Dict):
    for author in data['Author']:
        name = author['name']
        agency = author['Agency']
        yield dict([('Author', name)] + list(flatten(agency)))

df = pd.DataFrame(func(data))

我找不到更好的方法,但是一旦解決方案可以首先從作者中提取名稱列並分解列表,以便在再次使用json_normalize提取所需列時擁有 json:

In [38]: dic = {'Author': [{'name': 'John', 'Agency': {'Marketing': [{'name': 'SD_SM_14'}], 'Media': [{'codeX': 's_wse@
    ...: 2'}]}}]}

In [39]: df = pd.DataFrame(dic)

In [40]: df
Out[40]:
                                              Author
0  {'name': 'John', 'Agency': {'Marketing': [{'na...

In [41]: df = pd.json_normalize(df.Author)

In [42]: df
Out[42]:
   name        Agency.Marketing            Agency.Media
0  John  [{'name': 'SD_SM_14'}]  [{'codeX': 's_wse@2'}]

In [43]: df1 = df.explode('Agency.Marketing')

In [44]: df1
Out[44]:
   name      Agency.Marketing            Agency.Media
0  John  {'name': 'SD_SM_14'}  [{'codeX': 's_wse@2'}]

In [45]: df1 = df1.explode('Agency.Media')
In [47]: df2 = pd.json_normalize(df1['Agency.Marketing'])

In [48]: df2
Out[48]:
       name
0  SD_SM_14

In [49]: df3 = pd.json_normalize(df1['Agency.Media'])

In [50]: df3
Out[50]:
     codeX
0  s_wse@2

In [51]: main_df = pd.concat([df1,df2,df3], axis=1)

In [52]: main_df
Out[52]:
   name      Agency.Marketing          Agency.Media      name    codeX
0  John  {'name': 'SD_SM_14'}  {'codeX': 's_wse@2'}  SD_SM_14  s_wse@2

In [53]: main_df.drop(['Agency.Marketing','Agency.Media'],inplace=True,axis=1)

In [54]: main_df
Out[54]:
   name      name    codeX
0  John  SD_SM_14  s_wse@2

更新

如果您已導入 json_normalize 方法:

只需使用json_normalize而不是pd.json_normalize

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM