简体   繁体   中英

How to convert this JSON object into a pandas Dataframe?

I am trying to convert this json code to a python dataframe. I've been using a json converter to convert this to csv. It appears that the columns values are before the colon. Should I just wrap a list of the column names? My end goal is to optimize the importing data process flow. This is a sample json data. I keep reading answers of people converting this. No need for the full column. Just need help on getting started.

data = {
  "address": [
    {
      "state": "22"
    }
  ],
  "birthDate": "1952-11-17",
  "extension": [
    {
      "url": "https://bluebutton.cms.gov/resources/variables/race",
      "valueCoding": {
        "code": "1",
        "display": "White",
        "system": "https://bluebutton.cms.gov/resources/variables/race"
      }
    },
    {
      "url": "https://bluebutton.cms.gov/resources/variables/rfrnc_yr",
      "valueDate": "2021"
    }
  ],
  "gender": "male",
  "id": "-10000000000066",
  "identifier": [
    {
      "system": "https://bluebutton.cms.gov/resources/variables/bene_id",
      "value": "-10000000000066"
    },
    {
      "system": "https://bluebutton.cms.gov/resources/identifier/mbi-hash",
      "value": "0e239e4895a76a2aff678507b1626a7cd08d23db07280e7efa228c8b0c156d23"
    },
    {
      "extension": [
        {
          "url": "https://bluebutton.cms.gov/resources/codesystem/identifier-currency",
          "valueCoding": {
            "code": "current",
            "display": "Current",
            "system": "https://bluebutton.cms.gov/resources/codesystem/identifier-currency"
          }
        }
      ],
      "system": "http://hl7.org/fhir/sid/us-mbi",
      "value": "1S00E00AA66"
    }
  ],
  "meta": {
    "lastUpdated": "2021-08-17T13:43:00.037-04:00"
  },
  "name": [
    {
      "family": "Schneider199",
      "given": [
        "Werner409"
      ],
      "use": "usual"
    }
  ],
  "resourceType": "Patient"
}

For getting started:

You may convert your JSON into format like this

data = {
    'columnname1': [entry1, entry2, entry3, ...],
    'columnname2': [entry1, entry2, entry3, ...],
    ...
}

or, if you want multiple index levels,

data = {
    ('level1', 'level2', 'columnname1): [entry1, entry2, entry3, ...],
    ('level1', 'level2', 'columnname2): [entry1, entry2, entry3, ...],
    ...
}

Also, make sure all lists contain the same number of entries.

In either formats, you can convert it into a DataFrame with

pd.DataFrame(data)

in the way that your keys become column names, and the list of entries for that key become values of the corresponding column.

To convert your sample dict to pd.Dataframe , you have to ensure the lengths of all arrays are identical.

Your sample data incorporates different data types with various lengths, ie list with length of 3, string, integer. In this case, you can not convert it to dataframe because (1) string, integer those data types are not array except list; (2) they are not in the same size

Here is a possible solution:

import pandas as pd

# Turn all values in dict to arrays
for x,y in data.items():
    data[x] = [y]

# Convert dict to dataframe
df = pd.DataFrame.from_dict(data)

However, the output I believe is not the final output you desire, perhaps you could provide us a sample output like Excel table, screenshot, etc., to draw a better solution.

Another way to do this is the following. I define a function that alows me to flatten any json file:

import json
import pandas as pd


def flatten_nested_json_df(df):
    df = df.reset_index()
    s = (df.applymap(type) == list).all()
    list_columns = s[s].index.tolist()
    
    s = (df.applymap(type) == dict).all()
    dict_columns = s[s].index.tolist()

    
    while len(list_columns) > 0 or len(dict_columns) > 0:
        new_columns = []

        for col in dict_columns:
            horiz_exploded = pd.json_normalize(df[col]).add_prefix(f'{col}.')
            horiz_exploded.index = df.index
            df = pd.concat([df, horiz_exploded], axis=1).drop(columns=[col])
            new_columns.extend(horiz_exploded.columns) # inplace

        for col in list_columns:
            #print(f"exploding: {col}")
            df = df.drop(columns=[col]).join(df[col].explode().to_frame())
            new_columns.append(col)

        s = (df[new_columns].applymap(type) == list).all()
        list_columns = s[s].index.tolist()

        s = (df[new_columns].applymap(type) == dict).all()
        dict_columns = s[s].index.tolist()
    return df


Then simply do the following:

results = pd.json_normalize(data)
df = pd.DataFrame(results)

outdf = flatten_nested_json_df(df)

which returns:

 index   birthDate gender               id resourceType  \
0       0  1952-11-17   male  -10000000000066      Patient   
0       0  1952-11-17   male  -10000000000066      Patient   
0       0  1952-11-17   male  -10000000000066      Patient   
0       0  1952-11-17   male  -10000000000066      Patient   
0       0  1952-11-17   male  -10000000000066      Patient   
..    ...         ...    ...              ...          ...   
0       0  1952-11-17   male  -10000000000066      Patient   
0       0  1952-11-17   male  -10000000000066      Patient   
0       0  1952-11-17   male  -10000000000066      Patient   
0       0  1952-11-17   male  -10000000000066      Patient   
0       0  1952-11-17   male  -10000000000066      Patient   

                 meta.lastUpdated address.state  \
0   2021-08-17T13:43:00.037-04:00            22   
0   2021-08-17T13:43:00.037-04:00            22   
0   2021-08-17T13:43:00.037-04:00            22   
0   2021-08-17T13:43:00.037-04:00            22   
0   2021-08-17T13:43:00.037-04:00            22   
..                            ...           ...   
0   2021-08-17T13:43:00.037-04:00            22   
0   2021-08-17T13:43:00.037-04:00            22   
0   2021-08-17T13:43:00.037-04:00            22   
0   2021-08-17T13:43:00.037-04:00            22   
0   2021-08-17T13:43:00.037-04:00            22   

                                        extension.url  \
0   https://bluebutton.cms.gov/resources/variables...   
0   https://bluebutton.cms.gov/resources/variables...   
0   https://bluebutton.cms.gov/resources/variables...   
0   https://bluebutton.cms.gov/resources/variables...   
0   https://bluebutton.cms.gov/resources/variables...   
..                                                ...   
0   https://bluebutton.cms.gov/resources/variables...   
0   https://bluebutton.cms.gov/resources/variables...   
0   https://bluebutton.cms.gov/resources/variables...   
0   https://bluebutton.cms.gov/resources/variables...   
0   https://bluebutton.cms.gov/resources/variables...   

   extension.valueCoding.code extension.valueCoding.display  \
0                           1                         White   
0                           1                         White   
0                           1                         White   
0                           1                         White   
0                           1                         White   
..                        ...                           ...   
0                         NaN                           NaN   
0                         NaN                           NaN   
0                         NaN                           NaN   
0                         NaN                           NaN   
0                         NaN                           NaN   

                         extension.valueCoding.system extension.valueDate  \
0   https://bluebutton.cms.gov/resources/variables...                 NaN   
0   https://bluebutton.cms.gov/resources/variables...                 NaN   
0   https://bluebutton.cms.gov/resources/variables...                 NaN   
0   https://bluebutton.cms.gov/resources/variables...                 NaN   
0   https://bluebutton.cms.gov/resources/variables...                 NaN   
..                                                ...                 ...   
0                                                 NaN                2021   
0                                                 NaN                2021   
0                                                 NaN                2021   
0                                                 NaN                2021   
0                                                 NaN                2021   

                                    identifier.system identifier.value  \
0   https://bluebutton.cms.gov/resources/variables...  -10000000000066   
0   https://bluebutton.cms.gov/resources/variables...  -10000000000066   
0   https://bluebutton.cms.gov/resources/variables...  -10000000000066   
0   https://bluebutton.cms.gov/resources/variables...  -10000000000066   
0   https://bluebutton.cms.gov/resources/variables...  -10000000000066   
..                                                ...              ...   
0                      http://hl7.org/fhir/sid/us-mbi      1S00E00AA66   
0                      http://hl7.org/fhir/sid/us-mbi      1S00E00AA66   
0                      http://hl7.org/fhir/sid/us-mbi      1S00E00AA66   
0                      http://hl7.org/fhir/sid/us-mbi      1S00E00AA66   
0                      http://hl7.org/fhir/sid/us-mbi      1S00E00AA66   

                                 identifier.extension   name.family name.use  \
0                                                 NaN  Schneider199    usual   
0                                                 NaN  Schneider199    usual   
0                                                 NaN  Schneider199    usual   
0                                                 NaN  Schneider199    usual   
0                                                 NaN  Schneider199    usual   
..                                                ...           ...      ...   
0   [{'url': 'https://bluebutton.cms.gov/resources...  Schneider199    usual   
0   [{'url': 'https://bluebutton.cms.gov/resources...  Schneider199    usual   
0   [{'url': 'https://bluebutton.cms.gov/resources...  Schneider199    usual   
0   [{'url': 'https://bluebutton.cms.gov/resources...  Schneider199    usual   
0   [{'url': 'https://bluebutton.cms.gov/resources...  Schneider199    usual   

   name.given  
0   Werner409  
0   Werner409  
0   Werner409  
0   Werner409  
0   Werner409  
..        ...  
0   Werner409  
0   Werner409  
0   Werner409  
0   Werner409  
0   Werner409  

[20736 rows x 18 columns]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM