简体   繁体   中英

Python Pandas: How to extract data from JSON, and turn it into data frame?

I have JSON data like

{'Author': [{'name': 'John', 'Agency': {'Marketing': [{'name': 'SD_SM_14'}], 'Media': [{'codeX': 's_wse@2'}]}}]}

I would like to extract three columns (Author, Marketing, and Media) and turn it into data like below:

Author  Marketing  Media
John    SD_SM_14   s_wse@2

Thank you for any help in advance!

Maybe you should explicitly flatten the nested JSON data you post.

JSON Structure:

{
  "Author": [
    {
      "name": "John",
      "Agency": {
        "Marketing": [
          {
            "name": "SD_SM_14"
          }
        ],
        "Media": [
          {
            "codeX": "s_wse@2"
          }
        ]
      }
    }
  ]
}

What you want:

Author  Marketing  Media
John    SD_SM_14   s_wse@2

Here is the code:

import pandas as pd
from typing import Dict

def flatten(data: Dict):
    for key, value in data.items():
        for res in value:
            # assume that there is only one key in `res`
            yield key, next(iter(res.values()))

def func(data: Dict):
    for author in data['Author']:
        name = author['name']
        agency = author['Agency']
        yield dict([('Author', name)] + list(flatten(agency)))

df = pd.DataFrame(func(data))

I could not find a better way, but once solution can be to first extract name column from the Author and the explode the lists so that you have json when again you use json_normalize to extract the required columns:

In [38]: dic = {'Author': [{'name': 'John', 'Agency': {'Marketing': [{'name': 'SD_SM_14'}], 'Media': [{'codeX': 's_wse@
    ...: 2'}]}}]}

In [39]: df = pd.DataFrame(dic)

In [40]: df
Out[40]:
                                              Author
0  {'name': 'John', 'Agency': {'Marketing': [{'na...

In [41]: df = pd.json_normalize(df.Author)

In [42]: df
Out[42]:
   name        Agency.Marketing            Agency.Media
0  John  [{'name': 'SD_SM_14'}]  [{'codeX': 's_wse@2'}]

In [43]: df1 = df.explode('Agency.Marketing')

In [44]: df1
Out[44]:
   name      Agency.Marketing            Agency.Media
0  John  {'name': 'SD_SM_14'}  [{'codeX': 's_wse@2'}]

In [45]: df1 = df1.explode('Agency.Media')
In [47]: df2 = pd.json_normalize(df1['Agency.Marketing'])

In [48]: df2
Out[48]:
       name
0  SD_SM_14

In [49]: df3 = pd.json_normalize(df1['Agency.Media'])

In [50]: df3
Out[50]:
     codeX
0  s_wse@2

In [51]: main_df = pd.concat([df1,df2,df3], axis=1)

In [52]: main_df
Out[52]:
   name      Agency.Marketing          Agency.Media      name    codeX
0  John  {'name': 'SD_SM_14'}  {'codeX': 's_wse@2'}  SD_SM_14  s_wse@2

In [53]: main_df.drop(['Agency.Marketing','Agency.Media'],inplace=True,axis=1)

In [54]: main_df
Out[54]:
   name      name    codeX
0  John  SD_SM_14  s_wse@2

Update :

if you have imported the json_normalize method:

just use json_normalize instead of pd.json_normalize

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM