[英]extract data from dictionary with nested dictionaries that contain lists that contain dictionaries
我有一個來自 api 的響應,其中包含來自加熱系統的數據池,結構為帶有嵌套字典的字典,其中包含包含字典的列表。
例如
sample = {"zoneType": "HEATING",
"interval": {"from": "2020-10-23T22:45:00.000Z", "to": "2020-10-24T23:15:00.000Z"},
"hoursInDay": 24,
"measuredData": {
"measuringDeviceConnected": {
"timeSeriesType": "dataIntervals",
"valueType": "boolean",
"dataIntervals": [{
"from": "2020-10-23T22:45:00.000Z", "to": "2020-10-24T23:15:00.000Z", "value": True}]
},
"insideTemperature": {
"timeSeriesType": "dataPoints",
"valueType": "temperature",
"min": {
"celsius": 19.34,
"fahrenheit": 66.81},
"max": {
"celsius": 20.6,
"fahrenheit": 69.08},
"dataPoints": [
{"timestamp": "2020-10-23T22:45:00.000Z", "value": {"celsius": 20.6, "fahrenheit": 69.08}},
{"timestamp": "2020-10-23T23:00:00.000Z", "value": {"celsius": 20.55, "fahrenheit": 68.99}},
{"timestamp": "2020-10-23T23:15:00.000Z", "value": {"celsius": 20.53, "fahrenheit": 68.95}},
{"timestamp": "2020-10-23T23:30:00.000Z", "value": {"celsius": 20.51, "fahrenheit": 68.92}},
{"timestamp": "2020-10-23T23:45:00.000Z", "value": {"celsius": 20.48, "fahrenheit": 68.86}},
{"timestamp": "2020-10-24T00:00:00.000Z", "value": {"celsius": 20.48, "fahrenheit": 68.86}},
{"timestamp": "2020-10-24T00:15:00.000Z", "value": {"celsius": 20.44, "fahrenheit": 68.79}}]
},
"humidity": {
"timeSeriesType": "dataPoints",
"valueType": "percentage",
"percentageUnit": "UNIT_INTERVAL",
"min": 0.615,
"max": 0.627,
"dataPoints": [
{"timestamp": "2020-10-23T22:45:00.000Z", "value": 0.615},
{"timestamp": "2020-10-23T23:00:00.000Z", "value": 0.615},
{"timestamp": "2020-10-23T23:15:00.000Z", "value": 0.619},
{"timestamp": "2020-10-23T23:30:00.000Z", "value": 0.620},
{"timestamp": "2020-10-23T23:45:00.000Z", "value": 0.621},
{"timestamp": "2020-10-24T00:00:00.000Z", "value": 0.623},
{"timestamp": "2020-10-24T00:15:00.000Z", "value": 0.627}]
}
}}
我想從上面提取 ['insideTemperature']['datapoints'] 時間戳和 celsius 值(實際數據跨越更多時間段),並將它們作為列放在新的 pd.DataFrame 中以及來自“濕度”鍵的其他數據. 在適當的時候,我想將其與來自具有類似結構的單獨 API 調用的數據合並,但可能沒有一致的時間戳值。
許多頂級詞典包含匯總數據(例如最小值和最大值),因此可以忽略。 同樣,如果需要,從 celsius 轉換為 f 等很容易,所以我不想提取這些數據。
干凈地創建一個 DataFile 的最佳方法是什么,該文件列出此查詢中的時間戳、攝氏溫度和濕度,然后我可以將其與其他查詢輸出連接?
到目前為止,我一直在使用以下內容:
import pandas as pd
df = pd.DataFrame(sample['measuredData']['insideTemperature']['dataPoints'])
## remove column that contains dictionary data, leaving time data
df.drop(labels='value', axis=1, inplace=True)
## get temp data into new column
input_data_point = sample['measuredData']['insideTemperature']['dataPoints']
temps = []
for i in input_data_point:
temps.append(i['value']['celsius'])
df['inside_temp_c'] = pd.DataFrame(temps)
## repeat for humidity
input_data_point = sample['measuredData']['humidity']['dataPoints']
temps = []
for i in input_data_point:
temps.append(i['value'])
df['humidity_pct'] = pd.DataFrame(temps)
作為 python 編碼的新手,我想知道是否有一種更快的方法可以從原始下載的數據中提取數據,直接進入一個干凈的 Pandas DataFrame? 感謝任何建議。
您可以使用json_normalize
來獲取數據:
df1 = pd.json_normalize(sample,
record_path=['measuredData', 'insideTemperature', 'dataPoints'],
meta=['zoneType'])
print(df1)
df2 = pd.json_normalize(sample,
record_path=['measuredData', 'humidity', 'dataPoints'],
meta=['zoneType'])
print(df2)
df1:
timestamp value.celsius value.fahrenheit zoneType
0 2020-10-23T22:45:00.000Z 20.60 69.08 HEATING
1 2020-10-23T23:00:00.000Z 20.55 68.99 HEATING
2 2020-10-23T23:15:00.000Z 20.53 68.95 HEATING
3 2020-10-23T23:30:00.000Z 20.51 68.92 HEATING
4 2020-10-23T23:45:00.000Z 20.48 68.86 HEATING
5 2020-10-24T00:00:00.000Z 20.48 68.86 HEATING
6 2020-10-24T00:15:00.000Z 20.44 68.79 HEATING
df2:
timestamp value zoneType
0 2020-10-23T22:45:00.000Z 0.615 HEATING
1 2020-10-23T23:00:00.000Z 0.615 HEATING
2 2020-10-23T23:15:00.000Z 0.619 HEATING
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.