[英]convert pandas dataframe to a nested json
我有一个 dataframe 如下所示,其中有一列包含已嵌套的字典列表:
import pandas as pd
data = {'First': ['First value', 'Second value'],
'Second': ['First value', 'Second value'],
'third': ['First value', 'Second value'],
'forth': ['[{"values": "","entity": "datetime","","Turn": [{"expression": "","tid": "","type": "", "value": "","mod": "","anchor": "","beginPoint": "","endPoint": ""}]}]','[{"values": "","entity": "datetime","Turn": [{"expression": "","tid": "","type": "", "value": "","mod": "","anchor": "","beginPoint": "","endPoint": ""}]}]'],
}
df = pd.DataFrame (data, columns = ['First','second','third','forth'])
我想将其转换为以下 json 格式并保存为:
[
{
"first": "",
"second": "",
"third": "",
"forth": [
{
"values": "",
"entity": "",
"TIMEX3": [
{
"expression": "",
"tid": "",
"type": "",
"value": "",
"mod": "",
"anchorTimeID": "",
"beginPoint": "",
"endPoint": ""
}
]
}
]
},...
我尝试了以下方法,但是 output 太乱了,看起来不像我想保存的 output
my_json = (df.groupby(['text','intent','domain'], as_index=False)
.apply(lambda x: x[['entities']].to_dict('r'))
.reset_index()
.to_json(orient='records',indent= 2))
我相信,你离你想要的格式并不远。 唯一的问题是第四forth
包含作为字符串的字典。 一种可能的方法是将所有内容转换回字典,使用 eval 将字符串转换回字典,并使用 json 解析器很好地打印它:
import pandas as pd
import json
data = {'First': ['First value', 'Second value'],
'Second': ['First value', 'Second value'],
'third': ['First value', 'Second value'],
'forth': ['[{"values": "","entity": "datetime","Turn": [{"expression": "","tid": "","type": "", "value": "","mod": "","anchor": "","beginPoint": "","endPoint": ""}]}]','[{"values": "","entity": "datetime","Turn": [{"expression": "","tid": "","type": "", "value": "","mod": "","anchor": "","beginPoint": "","endPoint": ""}]}]'],
}
df = pd.DataFrame (data, columns = ['First','Second','third','forth'])
my_dict = df.to_dict(orient='records')
for row in my_dict:
row['forth'] = eval(row['forth'])
my_json = json.dumps(my_dict, indent=2)
print(my_json)
有两个小的更正, Second
键的大写,以及一个无效的条目: , "",
在您的forth
个键中。
这是我的 output 的副本:
[
{
"First": "First value",
"Second": "First value",
"third": "First value",
"forth": [
{
"values": "",
"entity": "datetime",
"Turn": [
{
"expression": "",
"tid": "",
"type": "",
"value": "",
"mod": "",
"anchor": "",
"beginPoint": "",
"endPoint": ""
}
]
}
]
}, ...
如果第四forth
已经是 dataframe 中的字典,您可以直接调用to_json
格式将是您想要的。 例如,您可以尝试将更正后的my_dict
转换回 dataframe:
test_df = pd.DataFrame(my_dict)
print(test_df.to_json(orient='records', indent=2))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.