简体   繁体   English

来自嵌套 JSON 的 Pandas 数据帧

[英]Pandas Data Frame from nested JSON

Given the following json dataset snapshot what is the best way to turn it into a pandas Data Frame?鉴于以下 json 数据集快照,将其转换为 pandas 数据帧的最佳方法是什么?

Reading the file into a data frame ends up to将文件读入数据框最终达到这个 which is not exactly usable.这不完全可用。

I am currently using json_normalize to turn location and sensor to separate Data Frames我目前正在使用json_normalizelocationsensor转换为单独的数据框在此处输入图像描述

but trying the same approach with sensordatavalues gives me the following error但是尝试使用sensordatavalues相同的方法会给我以下错误在此处输入图像描述 Is this because sensordatavalues is an array object?这是因为sensordatavalues是一个数组 object?

To make things worse, in some sensordatavalue records, the id key is missing更糟糕的是,在一些sensordatavalue记录中,缺少id

Just to make it a bit more challenging for you Pandas Gurus, Is there a way to do all the above in the same Data Frame?只是为了让您更具挑战性 Pandas 大师,有没有办法在同一个数据框中完成以上所有操作?

        "location": {
            "indoor": 0,
            "exact_location": 0,
            "latitude": "37.36",
            "altitude": "17.0",
            "id": 13487,
            "country": "GL",
            "longitude": "26.962"
        },
        "sampling_rate": null,
        "id": 105462750,
        "sensordatavalues": [
            {
                "value_type": "temperature",
                "value": "18.70",
                "id": 226552256
            },
            {
                "value_type": "humidity",
                "value": "99.90",
                "id": 226552257
            }
        ],
        "sensor": {
            "id": 25666,
            "sensor_type": {
                "name": "DHT22",
                "id": 9,
                "manufacturer": "various"
            },
            "pin": "7"
        },
        "timestamp": "2020-01-19 19:10:38"
    },

Use pd.Series.explode to unpack the list into individual rows.使用pd.Series.explode将列表解压缩成单独的行。

exploded = df['sensordatavalues'].explode()
exploded
# 0    {'value_type': 'temperature', 'value': '18.70'...
# 0    {'value_type': 'humidity', 'value': '99.90', '...
# Name: sensordatavalues, dtype: object

The index is maintained, with duplicates, so this can be easily rejoined to the original data.索引保持不变,有重复,因此可以很容易地重新加入原始数据。

joined = df.drop(columns='sensordatavalues').join(exploded)
joined
#                                             location sampling_rate         id                                             sensor            timestamp                                   sensordatavalues
# 0  {'indoor': 0, 'exact_location': 0, 'latitude':...          None  105462750  {'id': 25666, 'sensor_type': {'name': 'DHT22',...  2020-01-19 19:10:38  {'value_type': 'temperature', 'value': '18.70'...
# 0  {'indoor': 0, 'exact_location': 0, 'latitude':...          None  105462750  {'id': 25666, 'sensor_type': {'name': 'DHT22',...  2020-01-19 19:10:38  {'value_type': 'humidity', 'value': '99.90', '...

Then you can use json_normalize as before.然后你可以像以前一样使用json_normalize

json_normalize(joined['sensordatavalues'])
#     value_type  value         id
# 0  temperature  18.70  226552256
# 1     humidity  99.90  226552257

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM