简体   繁体   中英

How to flatten nested json data using pandas

So I wanted to flatten nested json data in this pandas data frame as additional columns. I am using requests to get json data from an api and create a pandas data frame.

import requests import pandas as pd import json

url = 'https://api.openaq.org/v2/locations?coordinates=38.7437396,-9.2302436&radius=5000'
api_request = requests.get(url)
api_content = api_request.content
api_json = json.loads(api_content)
df = pd.json_normalize(api_json)
df

This outputs the following:

results meta.name   meta.license    meta.website    meta.page   meta.limit  meta.found
0   [{'id': 8209, 'city': 'Lisboa', 'name': 'PT031...   openaq-api  CC BY 4.0d  api.openaq.org  1   100 4

so, I did

df = pd.json_normalize(api_json, record_path=["results"])

Which gives a nicer result:

    id  city    name    entity  country sources isMobile    isAnalysis  parameters  sensorType  lastUpdated firstUpdated    measurements    bounds  manufacturers   coordinates.latitude    coordinates.longitude
0   8209    Lisboa  PT03100 government  PT  [{'url': 'http://www.eea.europa.eu/themes/air/...   False   False   [{'id': 25414, 'unit': 'µg/m³', 'count': 10005...   reference grade 2022-08-27T16:00:00+00:00   2017-09-22T11:00:00+00:00   246216  None    None    38.748056   -9.202500
1   8211    Lisboa  PT03084 government  PT  [{'url': 'http://www.eea.europa.eu/themes/air/...   False   False   [{'id': 25397, 'unit': 'µg/m³', 'count': 93281...   reference grade 2022-08-27T16:00:00+00:00   2017-09-22T10:00:00+00:00   239733  None    None    38.754167   -9.230833
2   7199    Lisboa  PT03087 government  PT  [{'url': 'http://www.eea.europa.eu/themes/air/...   False   False   [{'id': 24800, 'unit': 'µg/m³', 'count': 97702...   reference grade 2022-08-27T16:00:00+00:00   2017-09-22T13:00:00+00:00   235512  None    None    38.705000   -9.210278
3   7767    Lisboa  PT03082 government  PT  [{'url': 'http://www.eea.europa.eu/themes/air/...   False   False   [{'id': 25393, 'unit': 'µg/m³', 'count': 83437...   reference grade 2022-08-27T16:00:00+00:00   2018-02-27T01:00:00+00:00   304693  None    None    38.738889   -9.207500

However, the 'sources' and 'parameter' columns are still in json format. How can I unpack these into new columns?

Also, can I for instance just pull the 'url' key out of the 'sources' json data and make it its own column?

You can do like this. Although there are so many columns you might want to prune things up a bit before merging into main df.

df_sources = pd.json_normalize(df.sources.explode())
df_params = pd.json_normalize(df.parameters.explode())

df = pd.concat([df, df_sources, df_params], axis=1)
del df['sources'], df['parameters']

This json is quite complicated, and is more so multiple dataframes in one.

We can differentiate between the inner frames and the metadata like so:

result1 = api_json['results'][0]
# It's a frame if it's a list
frames = [x for x, y in result1.items() if isinstance(y, list)]
# Otherwise it's just metadata
metadata = [x for x in result1 if x not in frames]

Now, let's make a list of dataframes, I'll remove null columns as well:

dfs = [pd.json_normalize(api_json['results'], frame, metadata, record_prefix=frame + '_').dropna(how='all', axis=1) for frame in frames]

Fix up that coordinates column:

dfs = [pd.concat([df, df.coordinates.apply(pd.Series)], axis=1).drop('coordinates', axis=1) for df in dfs]
metadata += ['latitude', 'longitude']

And concat the dfs together, using the metadata as the index:

df = pd.concat([df.set_index([x for x in df.columns if x in metadata]) for df in dfs], axis=1)
print(df)

# Output:
                                                                                                                                                                                                     sources_url sources_name sources_id  parameters_id parameters_unit  parameters_count  parameters_average  parameters_lastValue parameters_parameter parameters_displayName    parameters_lastUpdated  parameters_parameterId   parameters_firstUpdated
id   city   name    entity     country isMobile isAnalysis sensorType      lastUpdated               firstUpdated              measurements latitude  longitude
7199 Lisboa PT03087 government PT      False    False      reference grade 2022-08-27 18:00:00+00:00 2017-09-22 13:00:00+00:00 235524       38.705000 -9.210278  http://www.eea.europa.eu/themes/air/air-quality          EEA        eea          20728           µg/m³             38829            0.000000                   0.0                 pm10                   PM10 2019-12-24 16:00:00+00:00                       1 2018-03-22 16:00:00+00:00
                                                                                                                                                      -9.210278  http://www.eea.europa.eu/themes/air/air-quality          EEA        eea          24802           µg/m³             98987           53.124865                  58.0                   o3                O₃ mass 2022-08-27 18:00:00+00:00                       3 2017-09-22 13:00:00+00:00
                                                                                                                                                      -9.210278  http://www.eea.europa.eu/themes/air/air-quality          EEA        eea          24800           µg/m³             97708           18.870253                   5.0                  no2               NO₂ mass 2022-08-27 18:00:00+00:00                       5 2017-09-22 13:00:00+00:00
7767 Lisboa PT03082 government PT      False    False      reference grade 2022-08-27 18:00:00+00:00 2018-02-27 01:00:00+00:00 304711       38.738889 -9.207500  http://www.eea.europa.eu/themes/air/air-quality          EEA        eea          25392           µg/m³             85382           24.400488                  12.7                  no2               NO₂ mass 2022-08-27 18:00:00+00:00                       5 2018-03-22 12:00:00+00:00
                                                                                                                                                      -9.207500  http://www.eea.europa.eu/themes/air/air-quality          EEA        eea          22595           µg/m³             68071            3.050738                   5.0                  so2               SO₂ mass 2022-08-27 18:00:00+00:00                       6 2018-03-22 12:00:00+00:00
                                                                                                                                                      -9.207500  http://www.eea.europa.eu/themes/air/air-quality          EEA        eea          25391           µg/m³             67821         3528.784949                1221.0                   co                CO mass 2022-08-27 18:00:00+00:00                       4 2018-02-27 01:00:00+00:00
                                                                                                                                                      -9.207500  http://www.eea.europa.eu/themes/air/air-quality          EEA        eea          25393           µg/m³             83437           44.678977                  67.4                   o3                O₃ mass 2022-07-25 14:00:00+00:00                       3 2018-03-06 18:00:00+00:00
8209 Lisboa PT03100 government PT      False    False      reference grade 2022-08-27 18:00:00+00:00 2017-09-22 11:00:00+00:00 246228       38.748056 -9.202500  http://www.eea.europa.eu/themes/air/air-quality          EEA        eea          23940           µg/m³             47279            6.259864                   8.8                 pm10                   PM10 2022-08-27 12:00:00+00:00                       1 2017-09-22 11:00:00+00:00
                                                                                                                                                      -9.202500  http://www.eea.europa.eu/themes/air/air-quality          EEA        eea          24099           µg/m³             98890          298.816190                 197.0                   co                CO mass 2022-08-27 18:00:00+00:00                       4 2017-09-22 11:00:00+00:00
                                                                                                                                                      -9.202500  http://www.eea.europa.eu/themes/air/air-quality          EEA        eea          25414           µg/m³            100059           32.176792                  29.6                  no2               NO₂ mass 2022-08-27 18:00:00+00:00                       5 2017-09-22 11:00:00+00:00
8211 Lisboa PT03084 government PT      False    False      reference grade 2022-08-27 18:00:00+00:00 2017-09-22 10:00:00+00:00 239745       38.754167 -9.230833  http://www.eea.europa.eu/themes/air/air-quality          EEA        eea          25398           µg/m³             99881           57.904319                  94.0                   o3                O₃ mass 2022-08-27 18:00:00+00:00                       3 2017-09-22 11:00:00+00:00
                                                                                                                                                      -9.230833  http://www.eea.europa.eu/themes/air/air-quality          EEA        eea          23942           µg/m³             46577            3.805825                  10.0                 pm10                   PM10 2022-08-27 14:00:00+00:00                       1 2017-09-22 10:00:00+00:00
                                                                                                                                                      -9.230833  http://www.eea.europa.eu/themes/air/air-quality          EEA        eea          25397           µg/m³             93287           16.333777                  10.9                  no2               NO₂ mass 2022-08-27 18:00:00+00:00                       5 2017-09-22 11:00:00+00:00

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM