I have a graphql query that returns a string of JSON-formatted data, with 3 separate JSON objects inside. It looks like this:
{
"data": {
"readingsList": [
{
"value": 137,
"millis": 1651449224000
},
{
"value": 141,
"millis": 1651448924000
}
],
"messagesList": [
{
"value": 138,
"dateMillis": 1651445346000,
"text": "foo",
"type": "bar",
"field1": False
}
]
"userList": [
{
"userTimezone": "America/Los_Angeles"
}
]
}
}
What I'm trying to do is
readingsList
and messagesList
) based on the time ( millis
and dateMillis
) into a dataframeuserList
Desired output:
df.head(3)
datetime value text type field1 ...
2022-05-01 18:53:44 137 NA NA NA
2022-05-01 18:48:44 141 NA NA NA
2022-05-01 17:49:06 138 foo bar False
I can do steps 2 and 3 but I don't know how to do step 1.
If I convert the string using json.loads()
and pd.read_json()
I get the following output:
import json
import pandas as pd
json_str = load_data_gql(...)
j = json.loads(json_str)
df = pd.read_json(j)
df.head()
data
groupsList [{'userTimezone': 'America/Los_Angeles'}]
messagesList [{'value': 138, 'dateMillis': 1651445346000, ...
readingsList [{'value': 137, 'millis': 1651449224000}, {'value'...
I now suspect that the answer has to somehow do with json_normalize() but I'm having difficulty applying what I read in that documentation to navigate my JSON objects properly.
Any advice or help would be greatly appreciated, thank you so much in advance.
Merging the dataframes in this case can be done with pandas.concat([df_1,df_2])
Here's the code I used:
import json
import pandas as pd
json_obj = json.load(open('json_str_file.json', 'r')) # if reading from file
# json_obj = json.loads(json_str) # if reading from a string
# create two separate frames from each nested dictionary object
df_1 = pd.DataFrame.from_dict(json_obj['data']['messagesList'])
df_2 = pd.DataFrame.from_dict(json_obj['data']['readingsList'])
# set the index to the column you want to merge them on
df_1.set_index('dateMillis', inplace=True)
df_2.set_index('millis', inplace=True)
# use pd.concat to stack the dataframes together
df_merged = pd.concat([df_1,df_2])
# fix field1 to be a boolean field
df_merged['field1'] = df_merged['field1'].astype(bool)
# confirm the result matches the target
print(df_merged)
value text type field1
1651445346000 138 foo bar False
1651449224000 137 NaN NaN True
1651448924000 141 NaN NaN True
From here you should be able to do steps 2 and 3 from your post.
The example you gave had some formatting issues that might cause some confusion. messagesList
and readingsList
needed to be separated by a ',' for me. Also json.load()
didn't like the value of False
in my example.
Here is the reformatted JSON
{
"data": {
"readingsList": [
{
"value": 137,
"millis": 1651449224000
},
{
"value": 141,
"millis": 1651448924000
}
],
"messagesList": [
{
"value": 138,
"dateMillis": 1651445346000,
"text": "foo",
"type": "bar",
"field1": 0
}
],
"userList": [
{
"userTimezone": "America/Los_Angeles"
}
]
}
}
json.loads()
returns an object of type dict
with nested elements. pd.read_json()
expects a object of type str
pd.DataFrame.from_dict()
works with dict
objects and allows you to address the nested components like this: j['data']['messagesList']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.