来自嵌套 JSON 的 Pandas 数据帧

Question

Given the following json dataset snapshot what is the best way to turn it into a pandas Data Frame?鉴于以下 json 数据集快照，将其转换为 pandas 数据帧的最佳方法是什么？

Reading the file into a data frame ends up to将文件读入数据框最终达到 which is not exactly usable.这不完全可用。

I am currently using json_normalize to turn location and sensor to separate Data Frames我目前正在使用json_normalize将location和sensor转换为单独的数据框

but trying the same approach with sensordatavalues gives me the following error但是尝试使用sensordatavalues相同的方法会给我以下错误 Is this because sensordatavalues is an array object?这是因为sensordatavalues是一个数组 object？

To make things worse, in some sensordatavalue records, the id key is missing更糟糕的是，在一些sensordatavalue记录中，缺少id键

Just to make it a bit more challenging for you Pandas Gurus, Is there a way to do all the above in the same Data Frame?只是为了让您更具挑战性 Pandas 大师，有没有办法在同一个数据框中完成以上所有操作？

        "location": {
            "indoor": 0,
            "exact_location": 0,
            "latitude": "37.36",
            "altitude": "17.0",
            "id": 13487,
            "country": "GL",
            "longitude": "26.962"
        },
        "sampling_rate": null,
        "id": 105462750,
        "sensordatavalues": [
            {
                "value_type": "temperature",
                "value": "18.70",
                "id": 226552256
            },
            {
                "value_type": "humidity",
                "value": "99.90",
                "id": 226552257
            }
        ],
        "sensor": {
            "id": 25666,
            "sensor_type": {
                "name": "DHT22",
                "id": 9,
                "manufacturer": "various"
            },
            "pin": "7"
        },
        "timestamp": "2020-01-19 19:10:38"
    },

Answer 1

Use pd.Series.explode to unpack the list into individual rows.使用pd.Series.explode将列表解压缩成单独的行。

exploded = df['sensordatavalues'].explode()
exploded
# 0    {'value_type': 'temperature', 'value': '18.70'...
# 0    {'value_type': 'humidity', 'value': '99.90', '...
# Name: sensordatavalues, dtype: object

The index is maintained, with duplicates, so this can be easily rejoined to the original data.索引保持不变，有重复，因此可以很容易地重新加入原始数据。

joined = df.drop(columns='sensordatavalues').join(exploded)
joined
#                                             location sampling_rate         id                                             sensor            timestamp                                   sensordatavalues
# 0  {'indoor': 0, 'exact_location': 0, 'latitude':...          None  105462750  {'id': 25666, 'sensor_type': {'name': 'DHT22',...  2020-01-19 19:10:38  {'value_type': 'temperature', 'value': '18.70'...
# 0  {'indoor': 0, 'exact_location': 0, 'latitude':...          None  105462750  {'id': 25666, 'sensor_type': {'name': 'DHT22',...  2020-01-19 19:10:38  {'value_type': 'humidity', 'value': '99.90', '...

Then you can use json_normalize as before.然后你可以像以前一样使用json_normalize 。

json_normalize(joined['sensordatavalues'])
#     value_type  value         id
# 0  temperature  18.70  226552256
# 1     humidity  99.90  226552257

来自嵌套 JSON 的 Pandas 数据帧

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-04-19 21:33:26

来自嵌套 JSON 的 Pandas 数据帧

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-04-19 21:33:26

解决方案1
1 已采纳 2020-04-19 21:33:26