I am trying to convert this json code to a python dataframe. I've been using a json converter to convert this to csv. It appears that the columns values are before the colon. Should I just wrap a list of the column names? My end goal is to optimize the importing data process flow. This is a sample json data. I keep reading answers of people converting this. No need for the full column. Just need help on getting started.
data = {
"address": [
{
"state": "22"
}
],
"birthDate": "1952-11-17",
"extension": [
{
"url": "https://bluebutton.cms.gov/resources/variables/race",
"valueCoding": {
"code": "1",
"display": "White",
"system": "https://bluebutton.cms.gov/resources/variables/race"
}
},
{
"url": "https://bluebutton.cms.gov/resources/variables/rfrnc_yr",
"valueDate": "2021"
}
],
"gender": "male",
"id": "-10000000000066",
"identifier": [
{
"system": "https://bluebutton.cms.gov/resources/variables/bene_id",
"value": "-10000000000066"
},
{
"system": "https://bluebutton.cms.gov/resources/identifier/mbi-hash",
"value": "0e239e4895a76a2aff678507b1626a7cd08d23db07280e7efa228c8b0c156d23"
},
{
"extension": [
{
"url": "https://bluebutton.cms.gov/resources/codesystem/identifier-currency",
"valueCoding": {
"code": "current",
"display": "Current",
"system": "https://bluebutton.cms.gov/resources/codesystem/identifier-currency"
}
}
],
"system": "http://hl7.org/fhir/sid/us-mbi",
"value": "1S00E00AA66"
}
],
"meta": {
"lastUpdated": "2021-08-17T13:43:00.037-04:00"
},
"name": [
{
"family": "Schneider199",
"given": [
"Werner409"
],
"use": "usual"
}
],
"resourceType": "Patient"
}
For getting started:
You may convert your JSON into format like this
data = {
'columnname1': [entry1, entry2, entry3, ...],
'columnname2': [entry1, entry2, entry3, ...],
...
}
or, if you want multiple index levels,
data = {
('level1', 'level2', 'columnname1): [entry1, entry2, entry3, ...],
('level1', 'level2', 'columnname2): [entry1, entry2, entry3, ...],
...
}
Also, make sure all lists contain the same number of entries.
In either formats, you can convert it into a DataFrame with
pd.DataFrame(data)
in the way that your keys become column names, and the list of entries for that key become values of the corresponding column.
To convert your sample dict to pd.Dataframe
, you have to ensure the lengths of all arrays are identical.
Your sample data incorporates different data types with various lengths, ie list with length of 3, string, integer. In this case, you can not convert it to dataframe because (1) string, integer those data types are not array except list; (2) they are not in the same size
Here is a possible solution:
import pandas as pd
# Turn all values in dict to arrays
for x,y in data.items():
data[x] = [y]
# Convert dict to dataframe
df = pd.DataFrame.from_dict(data)
However, the output I believe is not the final output you desire, perhaps you could provide us a sample output like Excel table, screenshot, etc., to draw a better solution.
Another way to do this is the following. I define a function that alows me to flatten any json file:
import json
import pandas as pd
def flatten_nested_json_df(df):
df = df.reset_index()
s = (df.applymap(type) == list).all()
list_columns = s[s].index.tolist()
s = (df.applymap(type) == dict).all()
dict_columns = s[s].index.tolist()
while len(list_columns) > 0 or len(dict_columns) > 0:
new_columns = []
for col in dict_columns:
horiz_exploded = pd.json_normalize(df[col]).add_prefix(f'{col}.')
horiz_exploded.index = df.index
df = pd.concat([df, horiz_exploded], axis=1).drop(columns=[col])
new_columns.extend(horiz_exploded.columns) # inplace
for col in list_columns:
#print(f"exploding: {col}")
df = df.drop(columns=[col]).join(df[col].explode().to_frame())
new_columns.append(col)
s = (df[new_columns].applymap(type) == list).all()
list_columns = s[s].index.tolist()
s = (df[new_columns].applymap(type) == dict).all()
dict_columns = s[s].index.tolist()
return df
Then simply do the following:
results = pd.json_normalize(data)
df = pd.DataFrame(results)
outdf = flatten_nested_json_df(df)
which returns:
index birthDate gender id resourceType \
0 0 1952-11-17 male -10000000000066 Patient
0 0 1952-11-17 male -10000000000066 Patient
0 0 1952-11-17 male -10000000000066 Patient
0 0 1952-11-17 male -10000000000066 Patient
0 0 1952-11-17 male -10000000000066 Patient
.. ... ... ... ... ...
0 0 1952-11-17 male -10000000000066 Patient
0 0 1952-11-17 male -10000000000066 Patient
0 0 1952-11-17 male -10000000000066 Patient
0 0 1952-11-17 male -10000000000066 Patient
0 0 1952-11-17 male -10000000000066 Patient
meta.lastUpdated address.state \
0 2021-08-17T13:43:00.037-04:00 22
0 2021-08-17T13:43:00.037-04:00 22
0 2021-08-17T13:43:00.037-04:00 22
0 2021-08-17T13:43:00.037-04:00 22
0 2021-08-17T13:43:00.037-04:00 22
.. ... ...
0 2021-08-17T13:43:00.037-04:00 22
0 2021-08-17T13:43:00.037-04:00 22
0 2021-08-17T13:43:00.037-04:00 22
0 2021-08-17T13:43:00.037-04:00 22
0 2021-08-17T13:43:00.037-04:00 22
extension.url \
0 https://bluebutton.cms.gov/resources/variables...
0 https://bluebutton.cms.gov/resources/variables...
0 https://bluebutton.cms.gov/resources/variables...
0 https://bluebutton.cms.gov/resources/variables...
0 https://bluebutton.cms.gov/resources/variables...
.. ...
0 https://bluebutton.cms.gov/resources/variables...
0 https://bluebutton.cms.gov/resources/variables...
0 https://bluebutton.cms.gov/resources/variables...
0 https://bluebutton.cms.gov/resources/variables...
0 https://bluebutton.cms.gov/resources/variables...
extension.valueCoding.code extension.valueCoding.display \
0 1 White
0 1 White
0 1 White
0 1 White
0 1 White
.. ... ...
0 NaN NaN
0 NaN NaN
0 NaN NaN
0 NaN NaN
0 NaN NaN
extension.valueCoding.system extension.valueDate \
0 https://bluebutton.cms.gov/resources/variables... NaN
0 https://bluebutton.cms.gov/resources/variables... NaN
0 https://bluebutton.cms.gov/resources/variables... NaN
0 https://bluebutton.cms.gov/resources/variables... NaN
0 https://bluebutton.cms.gov/resources/variables... NaN
.. ... ...
0 NaN 2021
0 NaN 2021
0 NaN 2021
0 NaN 2021
0 NaN 2021
identifier.system identifier.value \
0 https://bluebutton.cms.gov/resources/variables... -10000000000066
0 https://bluebutton.cms.gov/resources/variables... -10000000000066
0 https://bluebutton.cms.gov/resources/variables... -10000000000066
0 https://bluebutton.cms.gov/resources/variables... -10000000000066
0 https://bluebutton.cms.gov/resources/variables... -10000000000066
.. ... ...
0 http://hl7.org/fhir/sid/us-mbi 1S00E00AA66
0 http://hl7.org/fhir/sid/us-mbi 1S00E00AA66
0 http://hl7.org/fhir/sid/us-mbi 1S00E00AA66
0 http://hl7.org/fhir/sid/us-mbi 1S00E00AA66
0 http://hl7.org/fhir/sid/us-mbi 1S00E00AA66
identifier.extension name.family name.use \
0 NaN Schneider199 usual
0 NaN Schneider199 usual
0 NaN Schneider199 usual
0 NaN Schneider199 usual
0 NaN Schneider199 usual
.. ... ... ...
0 [{'url': 'https://bluebutton.cms.gov/resources... Schneider199 usual
0 [{'url': 'https://bluebutton.cms.gov/resources... Schneider199 usual
0 [{'url': 'https://bluebutton.cms.gov/resources... Schneider199 usual
0 [{'url': 'https://bluebutton.cms.gov/resources... Schneider199 usual
0 [{'url': 'https://bluebutton.cms.gov/resources... Schneider199 usual
name.given
0 Werner409
0 Werner409
0 Werner409
0 Werner409
0 Werner409
.. ...
0 Werner409
0 Werner409
0 Werner409
0 Werner409
0 Werner409
[20736 rows x 18 columns]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.