简体   繁体   English

如何根据时间戳将多个 JSON 对象合并到 Python DataFrame 中?

[英]How to merge multiple JSON objects into a Python DataFrame based on timestamp?

I have a graphql query that returns a string of JSON-formatted data, with 3 separate JSON objects inside.我有一个 graphql 查询,它返回一串 JSON 格式的数据,里面有 3 个单独的 JSON 对象。 It looks like this:它看起来像这样:

{
  "data": {
    "readingsList": [
      {
        "value": 137,
        "millis": 1651449224000
      },
      {
        "value": 141,
        "millis": 1651448924000
      }
    ],
    "messagesList": [
      {
        "value": 138,
        "dateMillis": 1651445346000,
        "text": "foo",
        "type": "bar",
        "field1": False
      }
    ]
    "userList": [
      {
        "userTimezone": "America/Los_Angeles"
      }
    ]
  }
}

What I'm trying to do is我想做的是

  1. Merge the first two objects ( readingsList and messagesList ) based on the time ( millis and dateMillis ) into a dataframe根据时间( millisdateMillis )将前两个对象( readingsListmessagesList )合并到一个数据帧中
  2. Convert that time into a UTC datetime value (eg 1651449224000 becomes 2022-05-01 18:53:44)将该时间转换为 UTC 日期时间值(例如 1651449224000 变为 2022-05-01 18:53:44)
  3. Convert the UTC datetime value into local time for the user based on the users Timezone from userList根据userList中的用户时区将 UTC 日期时间值转换为用户的本地时间

Desired output:期望的输出:

df.head(3)

    datetime             value   text   type   field1   ...
    2022-05-01 18:53:44  137     NA     NA     NA
    2022-05-01 18:48:44  141     NA     NA     NA
    2022-05-01 17:49:06  138     foo    bar    False

I can do steps 2 and 3 but I don't know how to do step 1.我可以执行第 2 步和第 3 步,但我不知道如何执行第 1 步。

If I convert the string using json.loads() and pd.read_json() I get the following output:如果我使用json.loads()pd.read_json()转换字符串,我会得到以下输出:

import json
import pandas as pd

json_str = load_data_gql(...)
j = json.loads(json_str)
df = pd.read_json(j)

df.head()

                  data
    groupsList    [{'userTimezone': 'America/Los_Angeles'}]
    messagesList  [{'value': 138, 'dateMillis': 1651445346000, ...
    readingsList  [{'value': 137, 'millis': 1651449224000}, {'value'...

I now suspect that the answer has to somehow do with json_normalize() but I'm having difficulty applying what I read in that documentation to navigate my JSON objects properly.我现在怀疑答案与json_normalize()有某种关系,但我很难应用我在该文档中阅读的内容来正确导航我的 JSON 对象。

Any advice or help would be greatly appreciated, thank you so much in advance.任何建议或帮助将不胜感激,在此先感谢您。

Proposed Solution:建议的解决方案:

Merging the dataframes in this case can be done with pandas.concat([df_1,df_2])在这种情况下合并数据帧可以使用pandas.concat([df_1,df_2])

Here's the code I used:这是我使用的代码:

import json
import pandas as pd

json_obj = json.load(open('json_str_file.json', 'r')) # if reading from file
# json_obj = json.loads(json_str) # if reading from a string

# create two separate frames from each nested dictionary object
df_1 = pd.DataFrame.from_dict(json_obj['data']['messagesList'])
df_2 = pd.DataFrame.from_dict(json_obj['data']['readingsList'])

# set the index to the column you want to merge them on
df_1.set_index('dateMillis', inplace=True)
df_2.set_index('millis', inplace=True)

# use pd.concat to stack the dataframes together
df_merged = pd.concat([df_1,df_2])

# fix field1 to be a boolean field
df_merged['field1'] = df_merged['field1'].astype(bool)

# confirm the result matches the target
print(df_merged)

Output输出

               value text type  field1
1651445346000    138  foo  bar   False
1651449224000    137  NaN  NaN    True
1651448924000    141  NaN  NaN    True

From here you should be able to do steps 2 and 3 from your post.从这里您应该能够从您的帖子中执行第 2 步和第 3 步。

Issues with the JSON JSON 的问题

The example you gave had some formatting issues that might cause some confusion.您提供的示例存在一些格式问题,可能会导致一些混乱。 messagesList and readingsList needed to be separated by a ',' for me.对我来说, messagesListreadingsList需要用“,”分隔。 Also json.load() didn't like the value of False in my example.在我的示例中, json.load()也不喜欢False的值。

Here is the reformatted JSON这是重新格式化的 JSON

{
  "data": {
    "readingsList": [
      {
        "value": 137,
        "millis": 1651449224000
      },
      {
        "value": 141,
        "millis": 1651448924000
      }
    ],
    "messagesList": [
      {
        "value": 138,
        "dateMillis": 1651445346000,
        "text": "foo",
        "type": "bar",
        "field1": 0
      }
    ],
    "userList": [
      {
        "userTimezone": "America/Los_Angeles"
      }
    ]
  }
}

Potential Confusion:潜在的混乱:

  • JSON string could be formatted poorly JSON 字符串的格式可能很差
  • json.loads() returns an object of type dict with nested elements. json.loads()返回一个带有嵌套元素的dict类型的对象。
  • pd.read_json() expects a object of type str pd.read_json()需要一个str类型的对象
  • using pd.DataFrame.from_dict() works with dict objects and allows you to address the nested components like this: j['data']['messagesList']使用pd.DataFrame.from_dict()dict对象一起使用,并允许您像这样处理嵌套组件: j['data']['messagesList']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM