[英]How to merge multiple JSON objects into a Python DataFrame based on timestamp?
I have a graphql query that returns a string of JSON-formatted data, with 3 separate JSON objects inside.我有一个 graphql 查询,它返回一串 JSON 格式的数据,里面有 3 个单独的 JSON 对象。 It looks like this:
它看起来像这样:
{
"data": {
"readingsList": [
{
"value": 137,
"millis": 1651449224000
},
{
"value": 141,
"millis": 1651448924000
}
],
"messagesList": [
{
"value": 138,
"dateMillis": 1651445346000,
"text": "foo",
"type": "bar",
"field1": False
}
]
"userList": [
{
"userTimezone": "America/Los_Angeles"
}
]
}
}
What I'm trying to do is我想做的是
readingsList
and messagesList
) based on the time ( millis
and dateMillis
) into a dataframemillis
和dateMillis
)将前两个对象( readingsList
和messagesList
)合并到一个数据帧中userList
userList
中的用户时区将 UTC 日期时间值转换为用户的本地时间Desired output:期望的输出:
df.head(3)
datetime value text type field1 ...
2022-05-01 18:53:44 137 NA NA NA
2022-05-01 18:48:44 141 NA NA NA
2022-05-01 17:49:06 138 foo bar False
I can do steps 2 and 3 but I don't know how to do step 1.我可以执行第 2 步和第 3 步,但我不知道如何执行第 1 步。
If I convert the string using json.loads()
and pd.read_json()
I get the following output:如果我使用
json.loads()
和pd.read_json()
转换字符串,我会得到以下输出:
import json
import pandas as pd
json_str = load_data_gql(...)
j = json.loads(json_str)
df = pd.read_json(j)
df.head()
data
groupsList [{'userTimezone': 'America/Los_Angeles'}]
messagesList [{'value': 138, 'dateMillis': 1651445346000, ...
readingsList [{'value': 137, 'millis': 1651449224000}, {'value'...
I now suspect that the answer has to somehow do with json_normalize() but I'm having difficulty applying what I read in that documentation to navigate my JSON objects properly.我现在怀疑答案与json_normalize()有某种关系,但我很难应用我在该文档中阅读的内容来正确导航我的 JSON 对象。
Any advice or help would be greatly appreciated, thank you so much in advance.任何建议或帮助将不胜感激,在此先感谢您。
Merging the dataframes in this case can be done with pandas.concat([df_1,df_2])
在这种情况下合并数据帧可以使用
pandas.concat([df_1,df_2])
Here's the code I used:这是我使用的代码:
import json
import pandas as pd
json_obj = json.load(open('json_str_file.json', 'r')) # if reading from file
# json_obj = json.loads(json_str) # if reading from a string
# create two separate frames from each nested dictionary object
df_1 = pd.DataFrame.from_dict(json_obj['data']['messagesList'])
df_2 = pd.DataFrame.from_dict(json_obj['data']['readingsList'])
# set the index to the column you want to merge them on
df_1.set_index('dateMillis', inplace=True)
df_2.set_index('millis', inplace=True)
# use pd.concat to stack the dataframes together
df_merged = pd.concat([df_1,df_2])
# fix field1 to be a boolean field
df_merged['field1'] = df_merged['field1'].astype(bool)
# confirm the result matches the target
print(df_merged)
value text type field1
1651445346000 138 foo bar False
1651449224000 137 NaN NaN True
1651448924000 141 NaN NaN True
From here you should be able to do steps 2 and 3 from your post.从这里您应该能够从您的帖子中执行第 2 步和第 3 步。
The example you gave had some formatting issues that might cause some confusion.您提供的示例存在一些格式问题,可能会导致一些混乱。
messagesList
and readingsList
needed to be separated by a ',' for me.对我来说,
messagesList
和readingsList
需要用“,”分隔。 Also json.load()
didn't like the value of False
in my example.在我的示例中,
json.load()
也不喜欢False
的值。
Here is the reformatted JSON这是重新格式化的 JSON
{
"data": {
"readingsList": [
{
"value": 137,
"millis": 1651449224000
},
{
"value": 141,
"millis": 1651448924000
}
],
"messagesList": [
{
"value": 138,
"dateMillis": 1651445346000,
"text": "foo",
"type": "bar",
"field1": 0
}
],
"userList": [
{
"userTimezone": "America/Los_Angeles"
}
]
}
}
json.loads()
returns an object of type dict
with nested elements. json.loads()
返回一个带有嵌套元素的dict
类型的对象。pd.read_json()
expects a object of type str
pd.read_json()
需要一个str
类型的对象pd.DataFrame.from_dict()
works with dict
objects and allows you to address the nested components like this: j['data']['messagesList']
pd.DataFrame.from_dict()
与dict
对象一起使用,并允许您像这样处理嵌套组件: j['data']['messagesList']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.