繁体   English   中英

将 pandas dataframe 转换为嵌套的 json

[英]convert pandas dataframe to a nested json

我有一个 dataframe 如下所示,其中有一列包含已嵌套的字典列表:

import pandas as pd

data = {'First':  ['First value', 'Second value'],
    'Second': ['First value', 'Second value'],
    'third': ['First value', 'Second value'],
    'forth': ['[{"values": "","entity": "datetime","","Turn":  [{"expression": "","tid": "","type": "", "value": "","mod": "","anchor": "","beginPoint": "","endPoint": ""}]}]','[{"values": "","entity": "datetime","Turn": [{"expression": "","tid": "","type": "", "value": "","mod": "","anchor": "","beginPoint": "","endPoint": ""}]}]'],
    }

df = pd.DataFrame (data, columns = ['First','second','third','forth'])

我想将其转换为以下 json 格式并保存为:

[
  {
    "first": "",
    "second": "",
    "third": "",
    "forth": [
        {
          "values": "",
          "entity": "",
          "TIMEX3": [
            {
              "expression": "",
              "tid": "",
              "type": "",
              "value": "",
              "mod": "",
              "anchorTimeID": "",
              "beginPoint": "",
              "endPoint": ""
                    }
                  ]
                }
              ]
            },...

我尝试了以下方法,但是 output 太乱了,看起来不像我想保存的 output

  my_json = (df.groupby(['text','intent','domain'], as_index=False)
               .apply(lambda x: x[['entities']].to_dict('r'))
               .reset_index()
               .to_json(orient='records',indent= 2))

我相信,你离你想要的格式并不远。 唯一的问题是第四forth包含作为字符串的字典。 一种可能的方法是将所有内容转换回字典,使用 eval 将字符串转换回字典,并使用 json 解析器很好地打印它:

import pandas as pd
import json

data = {'First':  ['First value', 'Second value'],
    'Second': ['First value', 'Second value'],
    'third': ['First value', 'Second value'],
    'forth': ['[{"values": "","entity": "datetime","Turn":  [{"expression": "","tid": "","type": "", "value": "","mod": "","anchor": "","beginPoint": "","endPoint": ""}]}]','[{"values": "","entity": "datetime","Turn": [{"expression": "","tid": "","type": "", "value": "","mod": "","anchor": "","beginPoint": "","endPoint": ""}]}]'],
    }
df = pd.DataFrame (data, columns = ['First','Second','third','forth'])

my_dict = df.to_dict(orient='records')
for row in my_dict:
    row['forth'] = eval(row['forth'])
my_json = json.dumps(my_dict, indent=2)
print(my_json)

有两个小的更正, Second键的大写,以及一个无效的条目: , "",在您的forth个键中。

这是我的 output 的副本:

[
  {
    "First": "First value",
    "Second": "First value",
    "third": "First value",
    "forth": [
      {
        "values": "",
        "entity": "datetime",
        "Turn": [
          {
            "expression": "",
            "tid": "",
            "type": "",
            "value": "",
            "mod": "",
            "anchor": "",
            "beginPoint": "",
            "endPoint": ""
          }
        ]
      }
    ]
  },  ...

如果第四forth已经是 dataframe 中的字典,您可以直接调用to_json格式将是您想要的。 例如,您可以尝试将更正后的my_dict转换回 dataframe:

test_df = pd.DataFrame(my_dict)
print(test_df.to_json(orient='records', indent=2))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM