All kinds of nested dictionaries and Data Structures:)
I have a sample dictionary -
stream= {
"Outerclass": {
"Main_ID": "1",
"SetID": "1041",
"Version": 2,
"nestedData": {
"time": ["5000", "6000", "7000"],
"value": [1, 2, 3]
}
} }
and I want to create a dataframe out of it like this -
Main_ID SetID Version Time Value
0 1 1041 2.0 5000 1
1 1 1041 2.0 6000 2
2 1 1041 2.0 7000 3
I have written below code to produce what i need and I know it is not a good approach, if anybody could help suggest that will be great. Also I am sure that it will perform horribly when I will run it against streaming data. These 3 dataframes will be created in a single loop and data could range from 30,000 - 1,00,000 in time and value lists.
Code-
import pandas as pd
stream = {
"Outerclass": {
"Main_ID": "1",
"SetID": "1041",
"Version": 2,
"nestedData": {
"time": ["5000", "6000", "7000"],
"value": [1, 2, 3]
}
} }
df_outer = pd.DataFrame(stream["Outerclass"], index=[0])
print(df_outer)
df_time = pd.DataFrame(stream["Outerclass"]["nestedData"]["time"], columns=["Time"])
print(df_time)
df_value = pd.DataFrame(stream["Outerclass"]["nestedData"]["value"], columns=["Value"])
print(df_value)
full_df = pd.concat([df_outer,df_time,df_value], sort=True, axis=1)
print(full_df)
del full_df["nestedData"]
print(full_df)
Output -
Main_ID SetID Version Time Value
0 1 1041 2.0 5000 1
1 NaN NaN NaN 6000 2
2 NaN NaN NaN 7000 3
Use json_normalize
to flatten the dict to a dataframe then use explode
to convert lists to rows:
stream= {
"Outerclass": {
"Main_ID": "1",
"SetID": "1041",
"Version": 2,
"nestedData": {
"time": ["5000", "6000", "7000"],
"value": [1, 2, 3]
}
} }
df = pd.json_normalize(stream)
df = df.apply(pd.Series.explode).reset_index(drop=True)
print(df)
Outerclass.Main_ID Outerclass.SetID Outerclass.Version Outerclass.nestedData.time Outerclass.nestedData.value
0 1 1041 2 5000 1
1 1 1041 2 6000 2
2 1 1041 2 7000 3
We can try
from pandas.io.json import json_normalize
s = json_normalize(stream['Outerclass'])
s = s.join(pd.concat([s.pop(x).explode() for x in ['nestedData.time','nestedData.value']],axis=1))
s
Out[222]:
Main_ID SetID Version nestedData.time nestedData.value
0 1 1041 2 5000 1
0 1 1041 2 6000 2
0 1 1041 2 7000 3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.