簡體   English   中英

將 pandas dataframe 轉換為嵌套的 json

[英]convert pandas dataframe to a nested json

我有一個 dataframe 如下所示,其中有一列包含已嵌套的字典列表:

import pandas as pd

data = {'First':  ['First value', 'Second value'],
    'Second': ['First value', 'Second value'],
    'third': ['First value', 'Second value'],
    'forth': ['[{"values": "","entity": "datetime","","Turn":  [{"expression": "","tid": "","type": "", "value": "","mod": "","anchor": "","beginPoint": "","endPoint": ""}]}]','[{"values": "","entity": "datetime","Turn": [{"expression": "","tid": "","type": "", "value": "","mod": "","anchor": "","beginPoint": "","endPoint": ""}]}]'],
    }

df = pd.DataFrame (data, columns = ['First','second','third','forth'])

我想將其轉換為以下 json 格式並保存為:

[
  {
    "first": "",
    "second": "",
    "third": "",
    "forth": [
        {
          "values": "",
          "entity": "",
          "TIMEX3": [
            {
              "expression": "",
              "tid": "",
              "type": "",
              "value": "",
              "mod": "",
              "anchorTimeID": "",
              "beginPoint": "",
              "endPoint": ""
                    }
                  ]
                }
              ]
            },...

我嘗試了以下方法,但是 output 太亂了,看起來不像我想保存的 output

  my_json = (df.groupby(['text','intent','domain'], as_index=False)
               .apply(lambda x: x[['entities']].to_dict('r'))
               .reset_index()
               .to_json(orient='records',indent= 2))

我相信,你離你想要的格式並不遠。 唯一的問題是第四forth包含作為字符串的字典。 一種可能的方法是將所有內容轉換回字典,使用 eval 將字符串轉換回字典,並使用 json 解析器很好地打印它:

import pandas as pd
import json

data = {'First':  ['First value', 'Second value'],
    'Second': ['First value', 'Second value'],
    'third': ['First value', 'Second value'],
    'forth': ['[{"values": "","entity": "datetime","Turn":  [{"expression": "","tid": "","type": "", "value": "","mod": "","anchor": "","beginPoint": "","endPoint": ""}]}]','[{"values": "","entity": "datetime","Turn": [{"expression": "","tid": "","type": "", "value": "","mod": "","anchor": "","beginPoint": "","endPoint": ""}]}]'],
    }
df = pd.DataFrame (data, columns = ['First','Second','third','forth'])

my_dict = df.to_dict(orient='records')
for row in my_dict:
    row['forth'] = eval(row['forth'])
my_json = json.dumps(my_dict, indent=2)
print(my_json)

有兩個小的更正, Second鍵的大寫,以及一個無效的條目: , "",在您的forth個鍵中。

這是我的 output 的副本:

[
  {
    "First": "First value",
    "Second": "First value",
    "third": "First value",
    "forth": [
      {
        "values": "",
        "entity": "datetime",
        "Turn": [
          {
            "expression": "",
            "tid": "",
            "type": "",
            "value": "",
            "mod": "",
            "anchor": "",
            "beginPoint": "",
            "endPoint": ""
          }
        ]
      }
    ]
  },  ...

如果第四forth已經是 dataframe 中的字典,您可以直接調用to_json格式將是您想要的。 例如,您可以嘗試將更正后的my_dict轉換回 dataframe:

test_df = pd.DataFrame(my_dict)
print(test_df.to_json(orient='records', indent=2))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM