简体   繁体   中英

Python Pandas - Flatten Nested JSON

Working with Nested JSON data that I am trying to transform to a Pandas dataframe. The json_normalize function offers a way to accomplish this.

{'locations': [{'accuracy': 17,
                'activity': [{'activity': [{'confidence': 100,
                                            'type': 'STILL'}],
                              'timestampMs': '1542652'}],
                'altitude': -10,
                'latitudeE7': 3777321,
                'longitudeE7': -122423125,
                'timestampMs': '1542654',
                'verticalAccuracy': 2}]}

I utilized the function to normalize locations, however, the nested part 'activity' is not flat.

Here's my attempt:

activity_data = json_normalize(d, 'locations', ['activity','type', 'confidence'], 
                               meta_prefix='Prefix.',
                               errors='ignore') 

DataFrame:

[{u'activity': [{u'confidence': 100, u'type': ...   -10.0   NaN 377777377   -1224229340 1542652023196   

The Activity column still has nested elements which I need unpacked in its own column.

Any suggestions/tips would be much appreciated.

Use recursion to flatten the nested dicts

def flatten_json(nested_json: dict, exclude: list=['']) -> dict:
    """
    Flatten a list of nested dicts.
    """
    out = dict()
    def flatten(x: (list, dict, str), name: str='', exclude=exclude):
        if type(x) is dict:
            for a in x:
                if a not in exclude:
                    flatten(x[a], f'{name}{a}_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, f'{name}{i}_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(nested_json)
    return out

Data:

  • To create the dataset, I used the given data.
  • data is a json
data = {'locations': [{'accuracy': 17,'activity': [{'activity': [{'confidence': 100,'type': 'STILL'}],'timestampMs': '1542652'}],'altitude': -10,'latitudeE7': 3777321,'longitudeE7': -122423125,'timestampMs': '1542654','verticalAccuracy': 2},
                      {'accuracy': 17,'activity': [{'activity': [{'confidence': 100,'type': 'STILL'}],'timestampMs': '1542652'}],'altitude': -10,'latitudeE7': 3777321,'longitudeE7': -122423125,'timestampMs': '1542654','verticalAccuracy': 2},
                      {'accuracy': 17,'activity': [{'activity': [{'confidence': 100,'type': 'STILL'}],'timestampMs': '1542652'}],'altitude': -10,'latitudeE7': 3777321,'longitudeE7': -122423125,'timestampMs': '1542654','verticalAccuracy': 2}]}

Using flatten_json :

df = pd.DataFrame([flatten_json(x) for x in data['locations']])

Output:

 accuracy  activity_0_activity_0_confidence activity_0_activity_0_type activity_0_timestampMs  altitude  latitudeE7  longitudeE7 timestampMs  verticalAccuracy
       17                               100                      STILL                1542652       -10     3777321   -122423125     1542654                 2
       17                               100                      STILL                1542652       -10     3777321   -122423125     1542654                 2
       17                               100                      STILL                1542652       -10     3777321   -122423125     1542654                 2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM