简体   繁体   中英

Python - Parsing Nested Json

I have json file i am trying to parse out and get just two values out. 'timestamps' and 'values'. Here is the sample of my json file:

    {'totalCount': 1,
 'nextPageKey': None,
 'result': [{'metricId': 'builtin:synthetic.browser.event.visuallyComplete.load',
   'data': [{'dimensions': ['SYNTHETIC_TEST_STEP-123456'],
     'dimensionMap': {'dt.entity.synthetic_test_step': 'SYNTHETIC_TEST_STEP-123456'},
     'timestamps': [
      1596326400000,
      1616976000000,
      1617062400000,
      1617148800000,
      1617235200000],
     'values': [
      3880.834490740741,
      3879.6458333333335,
      3826.3645833333335,
      3890.871527777778,
      3876.8199643493763]}]}]}

I tried using approach with pandas:

    import pandas as pd
    pdp = pd.json_normalize(PDPjson['result'],['data'])

so I can get it into data frame but I am getting column with list of dates and column with list of values. 在此处输入图像描述

What I am trying to get is just a dataframe with two columns, Timestamp and Values.

How do I do this from here?

You can apply Series.explode on the timestamps and values :

pdp = pdp[['timestamps', 'values']].apply(pd.Series.explode).reset_index(drop=True)

#       timestamps       values
# 0  1596326400000  3880.834491
# 1  1616976000000  3879.645833
# 2  1617062400000  3826.364583
# 3  1617148800000  3890.871528
# 4  1617235200000  3876.819964

I created your JSON object as follows:

txt = '''\
{ "totalCount": 1,
  "nextPageKey": "None",
  "result": [{"metricId": "builtin:synthetic.browser.event.visuallyComplete.load",
    "data": [{"dimensions": ["SYNTHETIC_TEST_STEP-123456"],
      "dimensionMap": {"dt.entity.synthetic_test_step": "SYNTHETIC_TEST_STEP-123456"},
      "timestamps": [ 1596326400000, 1616976000000, 1617062400000,
        1617148800000, 1617235200000],
      "values": [ 3880.834490740741, 3879.6458333333335, 3826.3645833333335,
        3890.871527777778, 3876.8199643493763]}]}]}'''
PDPjson = json.loads(txt)

Then I started from json_normalize just as you did:

pdp = pd.json_normalize(PDPjson['result'], 'data')

The missing step is:

result = pdp[['timestamps', 'values']].apply(lambda col: col.explode())\
    .reset_index(drop=True)

The result is:

      timestamps   values
0  1596326400000  3880.83
1  1616976000000  3879.65
2  1617062400000  3826.36
3  1617148800000  3890.87
4  1617235200000  3876.82

Don't worry about smaller precision of values column, it is only the way how Jupyter Notebook presents the data.

When you run eg result.iloc[0,1] you will get:

3880.834490740741

with full precision.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM