如何根據時間戳將多個 JSON 對象合並到 Python DataFrame 中？

Question

我有一個 graphql 查詢，它返回一串 JSON 格式的數據，里面有 3 個單獨的 JSON 對象。 它看起來像這樣：

{
  "data": {
    "readingsList": [
      {
        "value": 137,
        "millis": 1651449224000
      },
      {
        "value": 141,
        "millis": 1651448924000
      }
    ],
    "messagesList": [
      {
        "value": 138,
        "dateMillis": 1651445346000,
        "text": "foo",
        "type": "bar",
        "field1": False
      }
    ]
    "userList": [
      {
        "userTimezone": "America/Los_Angeles"
      }
    ]
  }
}

我想做的是

根據時間（ millis和dateMillis ）將前兩個對象（ readingsList和messagesList ）合並到一個數據幀中
將該時間轉換為 UTC 日期時間值（例如 1651449224000 變為 2022-05-01 18:53:44）
根據userList中的用戶時區將 UTC 日期時間值轉換為用戶的本地時間

期望的輸出：

df.head(3)

    datetime             value   text   type   field1   ...
    2022-05-01 18:53:44  137     NA     NA     NA
    2022-05-01 18:48:44  141     NA     NA     NA
    2022-05-01 17:49:06  138     foo    bar    False

我可以執行第 2 步和第 3 步，但我不知道如何執行第 1 步。

如果我使用json.loads()和pd.read_json()轉換字符串，我會得到以下輸出：

import json
import pandas as pd

json_str = load_data_gql(...)
j = json.loads(json_str)
df = pd.read_json(j)

df.head()

                  data
    groupsList    [{'userTimezone': 'America/Los_Angeles'}]
    messagesList  [{'value': 138, 'dateMillis': 1651445346000, ...
    readingsList  [{'value': 137, 'millis': 1651449224000}, {'value'...

我現在懷疑答案與json_normalize()有某種關系，但我很難應用我在該文檔中閱讀的內容來正確導航我的 JSON 對象。

任何建議或幫助將不勝感激，在此先感謝您。

Answer 1

建議的解決方案：

在這種情況下合並數據幀可以使用pandas.concat([df_1,df_2])

這是我使用的代碼：

import json
import pandas as pd

json_obj = json.load(open('json_str_file.json', 'r')) # if reading from file
# json_obj = json.loads(json_str) # if reading from a string

# create two separate frames from each nested dictionary object
df_1 = pd.DataFrame.from_dict(json_obj['data']['messagesList'])
df_2 = pd.DataFrame.from_dict(json_obj['data']['readingsList'])

# set the index to the column you want to merge them on
df_1.set_index('dateMillis', inplace=True)
df_2.set_index('millis', inplace=True)

# use pd.concat to stack the dataframes together
df_merged = pd.concat([df_1,df_2])

# fix field1 to be a boolean field
df_merged['field1'] = df_merged['field1'].astype(bool)

# confirm the result matches the target
print(df_merged)

輸出

               value text type  field1
1651445346000    138  foo  bar   False
1651449224000    137  NaN  NaN    True
1651448924000    141  NaN  NaN    True

從這里您應該能夠從您的帖子中執行第 2 步和第 3 步。

JSON 的問題

您提供的示例存在一些格式問題，可能會導致一些混亂。 對我來說， messagesList和readingsList需要用“，”分隔。 在我的示例中， json.load()也不喜歡False的值。

這是重新格式化的 JSON

{
  "data": {
    "readingsList": [
      {
        "value": 137,
        "millis": 1651449224000
      },
      {
        "value": 141,
        "millis": 1651448924000
      }
    ],
    "messagesList": [
      {
        "value": 138,
        "dateMillis": 1651445346000,
        "text": "foo",
        "type": "bar",
        "field1": 0
      }
    ],
    "userList": [
      {
        "userTimezone": "America/Los_Angeles"
      }
    ]
  }
}

潛在的混亂：

JSON 字符串的格式可能很差
json.loads()返回一個帶有嵌套元素的dict類型的對象。
pd.read_json()需要一個str類型的對象
使用pd.DataFrame.from_dict()與dict對象一起使用，並允許您像這樣處理嵌套組件： j['data']['messagesList']

如何根據時間戳將多個 JSON 對象合並到 Python DataFrame 中？

問題描述

1 個解決方案

解決方案1
2 已采納 2022-06-01 20:12:26

建議的解決方案：

輸出

JSON 的問題

潛在的混亂：

如何根據時間戳將多個 JSON 對象合並到 Python DataFrame 中？

問題描述

1 個解決方案

解決方案1 2 已采納 2022-06-01 20:12:26

建議的解決方案：

輸出

JSON 的問題

潛在的混亂：

解決方案1
2 已采納 2022-06-01 20:12:26