如何使用 python 將列的內容拆分為 csv 文件中的不同列？

Question

我有一個 CSV 文件，其中包含來自我的機器學習 model 的 output。理想情況下，它應該有三列（源、關系類型、目標）。 提取 output 時，我的輸出被存儲為 n 行單元格的單個內容。 我不想要實體，我想要單獨列中的關系內容。
我附上了我的 output 和我預期的 output。
誰能指導我使用 python 將單元格的內容提取到不同的列中。

{'entities': [{'title': 'WarnerMedia', 'wikild': 'Q191715', 'label': 'Organization'}, {'title': 'Time (magazine)', 'wikild': 'Q43297', 'label': 'Organization'}, {'title': 'AOL', 'wikild': 'Q27585', 'label': 'Organization'}, {'title': 'Google', 'wikild': 'Q95', 'label': 'Organization'}, {'title': 'Warner Bros.', 'wikild': 'Q126399', 'label': 'Organization'}, {'title': 'U.S. Securities and Exchange Commission', 'wikild': 'Q953944', 'label': 'Organization'}], 'relations': [{'source': 'Time (magazine)', 'target': 'WarnerMedia', 'type': 'owned by'}, {'source': 'WarnerMedia', 'target': 'Time (magazine)', 'type': 'subsidiary'}, {'source': 'WarnerMedia', 'target': 'Time (magazine)', 'type': 'owned by'}, {'source': 'WarnerMedia', 'target': 'U.S. Securities and Exchange Commission', 'type': 'subsidiary'}, {'source': 'U.S. Securities and Exchange Commission', 'target': 'WarnerMedia', 'type': 'subsidiary'}, {'source': 'WarnerMedia', 'target': 'AOL', 'type': 'subsidiary'}, {'source': 'AOL', 'target': 'WarnerMedia', 'type': O 'subsidiary'}]}
{'entities': [{'title': 'Europe', 'wikild': 'Q46', 'label': 'Location'}, {'title': 'London', 'wikild': 'Q84', 'label': 'Organization'}, {'title': 'Federal Reserve', 'wikild': 'Q53536', 'label': 'Organization'}, {'title': 'United States', 'wikild': 'Q30', 'label': 'Organization'}, {'title': 'Federal government of the United States', 'wikild': 'Q48525', 'label': 'Organization'}, {'title': 'Bank of America', 'wikild': 'Q487907', 'label': 'Organization'}, {'title': 'Group of Seven', 'wikild': 'Q1764511', 'label': 'Organization'}, {'title': 'United States dollar', 'wikild': 'Q4917', 'label': 'Organization'}, {'title': 'New York (state)', 'wikild': 'Q1384', 'label': 'Organization'}, {'title': 'Alan Greenspan', 'wikild': 'Q193635', 'label': 'Person'}, {'title': 'Euro', 'wikild': 'Q4916', 'label': 'Organization'}, {'title': 'Germany', 'wikild': 'Q183', 'label': 'Organization'}], 'relations': [{'source': 'Federal Reserve', 'target': 'London', 'type': 'headquarters location'}, {'source': 'Bank of America', 'target': 'New York (state)', 'type': 'headquarters location'}, {'source': 'London', 'target': 'Federal Reserve', 'type': 'headquarters location'}, {'source': 'New York (state)', 1 'target': 'Bank of America', 'type': 'headquarters location'}]}

預期的 Output 應該是這樣的：

Answer 1

這是你需要的嗎？ 您沒有提到第二本詞典的用途，因為示例 output 僅引用第一本詞典。

inp = {'entities': [{'title': 'WarnerMedia', 'wikild': 'Q191715', 'label': 'Organization'}, 
                    {'title': 'Time (magazine)', 'wikild': 'Q43297', 'label': 'Organization'}, 
                    {'title': 'AOL', 'wikild': 'Q27585', 'label': 'Organization'}, 
                    {'title': 'Google', 'wikild': 'Q95', 'label': 'Organization'}, 
                    {'title': 'Warner Bros.', 'wikild': 'Q126399', 'label': 'Organization'}, 
                    {'title': 'U.S. Securities and Exchange Commission', 'wikild': 'Q953944', 'label': 'Organization'}
                   ], 
       'relations': [{'source': 'Time (magazine)', 'target': 'WarnerMedia', 'type': 'owned by'}, 
                     {'source': 'WarnerMedia', 'target': 'Time (magazine)', 'type': 'subsidiary'}, 
                     {'source': 'WarnerMedia', 'target': 'Time (magazine)', 'type': 'owned by'}, 
                     {'source': 'WarnerMedia', 'target': 'U.S. Securities and Exchange Commission', 'type': 'subsidiary'}, 
                     {'source': 'U.S. Securities and Exchange Commission', 'target': 'WarnerMedia', 'type': 'subsidiary'}, 
                     {'source': 'WarnerMedia', 'target': 'AOL', 'type': 'subsidiary'}, 
                     {'source': 'AOL', 'target': 'WarnerMedia', 'type': 'subsidiary'}
                    ]
      }

df = pd.DataFrame(inp['relations'])       #Simply conversion to dataframe
output = df[['source','type','target']]   #Reordering columns
output

Answer 2

我想數據是一個字符串，但我不確定它們是一個 object 還是多個對象。

在我的回答中，我想每次只有一個 object 如果沒有； 那么唯一的區別就是有一個for-loop附加數據。

import json
import pandas as pd

JSON="""
{
    'entities': 
    [
        {'title': 'WarnerMedia', 'wikild': 'Q191715', 'label': 'Organization'}, 
        {'title': 'Time (magazine)', 'wikild': 'Q43297', 'label': 'Organization'},
        {'title': 'AOL', 'wikild': 'Q27585', 'label': 'Organization'}, 
        {'title': 'Google', 'wikild': 'Q95', 'label': 'Organization'}, 
        {'title': 'Warner Bros.', 'wikild': 'Q126399', 'label': 'Organization'}, 
        {'title': 'U.S. Securities and Exchange Commission', 'wikild': 'Q953944', 'label': 'Organization'}
    ], 
    'relations': [
        {'source': 'Time (magazine)', 'target': 'WarnerMedia', 'type': 'owned by'}, 
        {'source': 'WarnerMedia', 'target': 'Time (magazine)', 'type': 'subsidiary'},
        {'source': 'WarnerMedia', 'target': 'Time (magazine)', 'type': 'owned by'}, 
        {'source': 'WarnerMedia', 'target': 'U.S. Securities and Exchange Commission', 'type': 'subsidiary'}, 
        {'source': 'U.S. Securities and Exchange Commission', 'target': 'WarnerMedia', 'type': 'subsidiary'}, 
        {'source': 'WarnerMedia', 'target': 'AOL', 'type': 'subsidiary'}, 
        {'source': 'AOL', 'target': 'WarnerMedia', 'type':'subsidiary'}
    ]
}
""".replace("'", '"')
json_object = json.loads(JSON)
df=pd.DataFrame(json_object["relations"])
df.head()

如何使用 python 將列的內容拆分為 csv 文件中的不同列？

問題描述

2 個解決方案

解決方案1
0 2022-12-27 15:02:33

解決方案2
0 2022-12-27 15:18:39

如何使用 python 將列的內容拆分為 csv 文件中的不同列？

問題描述

2 個解決方案

解決方案1 0 2022-12-27 15:02:33

解決方案2 0 2022-12-27 15:18:39

解決方案1
0 2022-12-27 15:02:33

解決方案2
0 2022-12-27 15:18:39