使用包含包含字典的列表的嵌套字典從字典中提取數據

Question

我有一個來自 api 的響應，其中包含來自加熱系統的數據池，結構為帶有嵌套字典的字典，其中包含包含字典的列表。

例如

    sample = {"zoneType": "HEATING",
              "interval": {"from": "2020-10-23T22:45:00.000Z", "to": "2020-10-24T23:15:00.000Z"},
              "hoursInDay": 24,
              "measuredData": {
                  "measuringDeviceConnected": {
                      "timeSeriesType": "dataIntervals",
                      "valueType": "boolean",
                      "dataIntervals": [{
                          "from": "2020-10-23T22:45:00.000Z", "to": "2020-10-24T23:15:00.000Z", "value": True}]
                          },
                  "insideTemperature": {
                      "timeSeriesType": "dataPoints",
                      "valueType": "temperature",
                      "min": {
                          "celsius": 19.34,
                          "fahrenheit": 66.81},
                      "max": {
                          "celsius": 20.6,
                          "fahrenheit": 69.08},
                      "dataPoints": [
                          {"timestamp": "2020-10-23T22:45:00.000Z", "value": {"celsius": 20.6, "fahrenheit": 69.08}},
                          {"timestamp": "2020-10-23T23:00:00.000Z", "value": {"celsius": 20.55, "fahrenheit": 68.99}},
                          {"timestamp": "2020-10-23T23:15:00.000Z", "value": {"celsius": 20.53, "fahrenheit": 68.95}},
                          {"timestamp": "2020-10-23T23:30:00.000Z", "value": {"celsius": 20.51, "fahrenheit": 68.92}},
                          {"timestamp": "2020-10-23T23:45:00.000Z", "value": {"celsius": 20.48, "fahrenheit": 68.86}},
                          {"timestamp": "2020-10-24T00:00:00.000Z", "value": {"celsius": 20.48, "fahrenheit": 68.86}},
                          {"timestamp": "2020-10-24T00:15:00.000Z", "value": {"celsius": 20.44, "fahrenheit": 68.79}}]
                  },
                  "humidity": {
                      "timeSeriesType": "dataPoints",
                      "valueType": "percentage",
                      "percentageUnit": "UNIT_INTERVAL",
                      "min": 0.615,
                      "max": 0.627,
                      "dataPoints": [
                          {"timestamp": "2020-10-23T22:45:00.000Z", "value": 0.615},
                          {"timestamp": "2020-10-23T23:00:00.000Z", "value": 0.615},
                          {"timestamp": "2020-10-23T23:15:00.000Z", "value": 0.619},
                          {"timestamp": "2020-10-23T23:30:00.000Z", "value": 0.620},
                          {"timestamp": "2020-10-23T23:45:00.000Z", "value": 0.621},
                          {"timestamp": "2020-10-24T00:00:00.000Z", "value": 0.623},
                          {"timestamp": "2020-10-24T00:15:00.000Z", "value": 0.627}]
                  }
              }}

我想從上面提取 ['insideTemperature']['datapoints'] 時間戳和 celsius 值（實際數據跨越更多時間段），並將它們作為列放在新的 pd.DataFrame 中以及來自“濕度”鍵的其他數據. 在適當的時候，我想將其與來自具有類似結構的單獨 API 調用的數據合並，但可能沒有一致的時間戳值。

許多頂級詞典包含匯總數據（例如最小值和最大值），因此可以忽略。 同樣，如果需要，從 celsius 轉換為 f 等很容易，所以我不想提取這些數據。

干凈地創建一個 DataFile 的最佳方法是什么，該文件列出此查詢中的時間戳、攝氏溫度和濕度，然后我可以將其與其他查詢輸出連接？

到目前為止，我一直在使用以下內容：

import pandas as pd
df = pd.DataFrame(sample['measuredData']['insideTemperature']['dataPoints'])

## remove column that contains dictionary data, leaving time data
df.drop(labels='value', axis=1, inplace=True)

## get temp data into new column
input_data_point = sample['measuredData']['insideTemperature']['dataPoints']

temps = []

for i in input_data_point:
    temps.append(i['value']['celsius'])

df['inside_temp_c'] = pd.DataFrame(temps)

## repeat for humidity
input_data_point = sample['measuredData']['humidity']['dataPoints']

temps = []

for i in input_data_point:
    temps.append(i['value'])

df['humidity_pct'] = pd.DataFrame(temps)

作為 python 編碼的新手，我想知道是否有一種更快的方法可以從原始下載的數據中提取數據，直接進入一個干凈的 Pandas DataFrame？ 感謝任何建議。

Answer 1

您可以使用json_normalize來獲取數據：

df1 = pd.json_normalize(sample,
                       record_path=['measuredData', 'insideTemperature', 'dataPoints'],
                       meta=['zoneType'])
print(df1)
df2 = pd.json_normalize(sample,
                       record_path=['measuredData', 'humidity', 'dataPoints'],
                       meta=['zoneType'])
print(df2)

df1：

                 timestamp  value.celsius  value.fahrenheit zoneType
0  2020-10-23T22:45:00.000Z          20.60             69.08  HEATING
1  2020-10-23T23:00:00.000Z          20.55             68.99  HEATING
2  2020-10-23T23:15:00.000Z          20.53             68.95  HEATING
3  2020-10-23T23:30:00.000Z          20.51             68.92  HEATING
4  2020-10-23T23:45:00.000Z          20.48             68.86  HEATING
5  2020-10-24T00:00:00.000Z          20.48             68.86  HEATING
6  2020-10-24T00:15:00.000Z          20.44             68.79  HEATING

df2：

                  timestamp  value zoneType
0  2020-10-23T22:45:00.000Z  0.615  HEATING
1  2020-10-23T23:00:00.000Z  0.615  HEATING
2  2020-10-23T23:15:00.000Z  0.619  HEATING

使用包含包含字典的列表的嵌套字典從字典中提取數據

問題描述

1 個解決方案

解決方案1
0 已采納 2020-10-27 20:36:43

使用包含包含字典的列表的嵌套字典從字典中提取數據

問題描述

1 個解決方案

解決方案1 0 已采納 2020-10-27 20:36:43

解決方案1
0 已采納 2020-10-27 20:36:43