[英]convert pandas dataframe to a nested json
我有一個 dataframe 如下所示,其中有一列包含已嵌套的字典列表:
import pandas as pd
data = {'First': ['First value', 'Second value'],
'Second': ['First value', 'Second value'],
'third': ['First value', 'Second value'],
'forth': ['[{"values": "","entity": "datetime","","Turn": [{"expression": "","tid": "","type": "", "value": "","mod": "","anchor": "","beginPoint": "","endPoint": ""}]}]','[{"values": "","entity": "datetime","Turn": [{"expression": "","tid": "","type": "", "value": "","mod": "","anchor": "","beginPoint": "","endPoint": ""}]}]'],
}
df = pd.DataFrame (data, columns = ['First','second','third','forth'])
我想將其轉換為以下 json 格式並保存為:
[
{
"first": "",
"second": "",
"third": "",
"forth": [
{
"values": "",
"entity": "",
"TIMEX3": [
{
"expression": "",
"tid": "",
"type": "",
"value": "",
"mod": "",
"anchorTimeID": "",
"beginPoint": "",
"endPoint": ""
}
]
}
]
},...
我嘗試了以下方法,但是 output 太亂了,看起來不像我想保存的 output
my_json = (df.groupby(['text','intent','domain'], as_index=False)
.apply(lambda x: x[['entities']].to_dict('r'))
.reset_index()
.to_json(orient='records',indent= 2))
我相信,你離你想要的格式並不遠。 唯一的問題是第四forth
包含作為字符串的字典。 一種可能的方法是將所有內容轉換回字典,使用 eval 將字符串轉換回字典,並使用 json 解析器很好地打印它:
import pandas as pd
import json
data = {'First': ['First value', 'Second value'],
'Second': ['First value', 'Second value'],
'third': ['First value', 'Second value'],
'forth': ['[{"values": "","entity": "datetime","Turn": [{"expression": "","tid": "","type": "", "value": "","mod": "","anchor": "","beginPoint": "","endPoint": ""}]}]','[{"values": "","entity": "datetime","Turn": [{"expression": "","tid": "","type": "", "value": "","mod": "","anchor": "","beginPoint": "","endPoint": ""}]}]'],
}
df = pd.DataFrame (data, columns = ['First','Second','third','forth'])
my_dict = df.to_dict(orient='records')
for row in my_dict:
row['forth'] = eval(row['forth'])
my_json = json.dumps(my_dict, indent=2)
print(my_json)
有兩個小的更正, Second
鍵的大寫,以及一個無效的條目: , "",
在您的forth
個鍵中。
這是我的 output 的副本:
[
{
"First": "First value",
"Second": "First value",
"third": "First value",
"forth": [
{
"values": "",
"entity": "datetime",
"Turn": [
{
"expression": "",
"tid": "",
"type": "",
"value": "",
"mod": "",
"anchor": "",
"beginPoint": "",
"endPoint": ""
}
]
}
]
}, ...
如果第四forth
已經是 dataframe 中的字典,您可以直接調用to_json
格式將是您想要的。 例如,您可以嘗試將更正后的my_dict
轉換回 dataframe:
test_df = pd.DataFrame(my_dict)
print(test_df.to_json(orient='records', indent=2))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.