[英]Pandas Data Frame from nested JSON
Given the following json dataset snapshot what is the best way to turn it into a pandas Data Frame?鉴于以下 json 数据集快照,将其转换为 pandas 数据帧的最佳方法是什么?
Reading the file into a data frame ends up to将文件读入数据框最终达到 which is not exactly usable.这不完全可用。
I am currently using json_normalize
to turn location
and sensor
to separate Data Frames我目前正在使用json_normalize
将location
和sensor
转换为单独的数据框
but trying the same approach with sensordatavalues
gives me the following error但是尝试使用sensordatavalues
相同的方法会给我以下错误 Is this because sensordatavalues
is an array object?这是因为sensordatavalues
是一个数组 object?
To make things worse, in some sensordatavalue
records, the id
key is missing更糟糕的是,在一些sensordatavalue
记录中,缺少id
键
Just to make it a bit more challenging for you Pandas Gurus, Is there a way to do all the above in the same Data Frame?只是为了让您更具挑战性 Pandas 大师,有没有办法在同一个数据框中完成以上所有操作?
"location": {
"indoor": 0,
"exact_location": 0,
"latitude": "37.36",
"altitude": "17.0",
"id": 13487,
"country": "GL",
"longitude": "26.962"
},
"sampling_rate": null,
"id": 105462750,
"sensordatavalues": [
{
"value_type": "temperature",
"value": "18.70",
"id": 226552256
},
{
"value_type": "humidity",
"value": "99.90",
"id": 226552257
}
],
"sensor": {
"id": 25666,
"sensor_type": {
"name": "DHT22",
"id": 9,
"manufacturer": "various"
},
"pin": "7"
},
"timestamp": "2020-01-19 19:10:38"
},
Use pd.Series.explode to unpack the list into individual rows.使用pd.Series.explode将列表解压缩成单独的行。
exploded = df['sensordatavalues'].explode()
exploded
# 0 {'value_type': 'temperature', 'value': '18.70'...
# 0 {'value_type': 'humidity', 'value': '99.90', '...
# Name: sensordatavalues, dtype: object
The index is maintained, with duplicates, so this can be easily rejoined to the original data.索引保持不变,有重复,因此可以很容易地重新加入原始数据。
joined = df.drop(columns='sensordatavalues').join(exploded)
joined
# location sampling_rate id sensor timestamp sensordatavalues
# 0 {'indoor': 0, 'exact_location': 0, 'latitude':... None 105462750 {'id': 25666, 'sensor_type': {'name': 'DHT22',... 2020-01-19 19:10:38 {'value_type': 'temperature', 'value': '18.70'...
# 0 {'indoor': 0, 'exact_location': 0, 'latitude':... None 105462750 {'id': 25666, 'sensor_type': {'name': 'DHT22',... 2020-01-19 19:10:38 {'value_type': 'humidity', 'value': '99.90', '...
Then you can use json_normalize
as before.然后你可以像以前一样使用json_normalize
。
json_normalize(joined['sensordatavalues'])
# value_type value id
# 0 temperature 18.70 226552256
# 1 humidity 99.90 226552257
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.