[英]Python Pandas - Json to DataFrame
我有一个复杂的Json文件,如下所示:
{
"User A" : {
"Obj1" : {
"key1": "val1",
"key2": "val2",
"key3": "val3",
}
"Obj2" : {
"key1": "val1",
"key2": "val2",
"key3": "val3"
}
}
"User B" : {
"Obj1" : {
"key1": "val1",
"key2": "val2",
"key3": "val3",
"key4": "val4"
}
}
}
我想把它变成一个看起来像这样的数据帧:
key1 key2 key3 key4
User A Obj1 val1 val2 val3 NaN
Obj2 val1 val2 val3 NaN
User B Obj1 val1 val2 val3 val4
大熊猫有可能吗? 如果是这样,我怎么能设法做到这一点?
你可以先读取文件到dict
:
with open('file.json') as data_file:
dd = json.load(data_file)
print(dd)
{'User B': {'Obj1': {'key2': 'val2', 'key4': 'val4', 'key1': 'val1', 'key3': 'val3'}},
'User A': {'Obj1': {'key2': 'val2', 'key1': 'val1', 'key3': 'val3'},
'Obj2': {'key2': 'val2', 'key1': 'val1', 'key3': 'val3'}}}
然后使用带有concat
dict comprehension
:
df = pd.concat({key:pd.DataFrame(dd[key]).T for key in dd.keys()})
print (df)
key1 key2 key3 key4
User A Obj1 val1 val2 val3 NaN
Obj2 val1 val2 val3 NaN
User B Obj1 val1 val2 val3 val4
用另一种解决方案read_json
,但首先需要通过重塑unstack
并删除NaN
按行dropna
。 最后需要DataFrame.from_records
:
df = pd.read_json('file.json').unstack().dropna()
print (df)
User A Obj1 {'key2': 'val2', 'key1': 'val1', 'key3': 'val3'}
Obj2 {'key2': 'val2', 'key1': 'val1', 'key3': 'val3'}
User B Obj1 {'key2': 'val2', 'key4': 'val4', 'key1': 'val1...
dtype: object
df1 = pd.DataFrame.from_records(df.values.tolist())
print (df1)
key1 key2 key3 key4
0 val1 val2 val3 NaN
1 val1 val2 val3 NaN
2 val1 val2 val3 val4
df1 = pd.DataFrame.from_records(df.values.tolist(), index = df.index)
print (df1)
key1 key2 key3 key4
User A Obj1 val1 val2 val3 NaN
Obj2 val1 val2 val3 NaN
User B Obj1 val1 val2 val3 val4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.