简体   繁体   English

如何使用 python 将列的内容拆分为 csv 文件中的不同列?

[英]How to split contents of column into different columns in csv files using python?

I have a CSV file that has the output from my machine learning model. It should ideally have three columns ( Source, Relation type, Target).我有一个 CSV 文件,其中包含来自我的机器学习 model 的 output。理想情况下,它应该有三列(源、关系类型、目标)。 When extracting the output my outputs are being stored as a single content of the cell for n number of rows.提取 output 时,我的输出被存储为 n 行单元格的单个内容。 I do not want the entities, I want the content of relations in separate columns.我不想要实体,我想要单独列中的关系内容。
I have attached my output and also my expected output.我附上了我的 output 和我预期的 output。
Can anyone please guide me on extracting the contents of the cell into different columns using python.谁能指导我使用 python 将单元格的内容提取到不同的列中。

{'entities': [{'title': 'WarnerMedia', 'wikild': 'Q191715', 'label': 'Organization'}, {'title': 'Time (magazine)', 'wikild': 'Q43297', 'label': 'Organization'}, {'title': 'AOL', 'wikild': 'Q27585', 'label': 'Organization'}, {'title': 'Google', 'wikild': 'Q95', 'label': 'Organization'}, {'title': 'Warner Bros.', 'wikild': 'Q126399', 'label': 'Organization'}, {'title': 'U.S. Securities and Exchange Commission', 'wikild': 'Q953944', 'label': 'Organization'}], 'relations': [{'source': 'Time (magazine)', 'target': 'WarnerMedia', 'type': 'owned by'}, {'source': 'WarnerMedia', 'target': 'Time (magazine)', 'type': 'subsidiary'}, {'source': 'WarnerMedia', 'target': 'Time (magazine)', 'type': 'owned by'}, {'source': 'WarnerMedia', 'target': 'U.S. Securities and Exchange Commission', 'type': 'subsidiary'}, {'source': 'U.S. Securities and Exchange Commission', 'target': 'WarnerMedia', 'type': 'subsidiary'}, {'source': 'WarnerMedia', 'target': 'AOL', 'type': 'subsidiary'}, {'source': 'AOL', 'target': 'WarnerMedia', 'type': O 'subsidiary'}]}
{'entities': [{'title': 'Europe', 'wikild': 'Q46', 'label': 'Location'}, {'title': 'London', 'wikild': 'Q84', 'label': 'Organization'}, {'title': 'Federal Reserve', 'wikild': 'Q53536', 'label': 'Organization'}, {'title': 'United States', 'wikild': 'Q30', 'label': 'Organization'}, {'title': 'Federal government of the United States', 'wikild': 'Q48525', 'label': 'Organization'}, {'title': 'Bank of America', 'wikild': 'Q487907', 'label': 'Organization'}, {'title': 'Group of Seven', 'wikild': 'Q1764511', 'label': 'Organization'}, {'title': 'United States dollar', 'wikild': 'Q4917', 'label': 'Organization'}, {'title': 'New York (state)', 'wikild': 'Q1384', 'label': 'Organization'}, {'title': 'Alan Greenspan', 'wikild': 'Q193635', 'label': 'Person'}, {'title': 'Euro', 'wikild': 'Q4916', 'label': 'Organization'}, {'title': 'Germany', 'wikild': 'Q183', 'label': 'Organization'}], 'relations': [{'source': 'Federal Reserve', 'target': 'London', 'type': 'headquarters location'}, {'source': 'Bank of America', 'target': 'New York (state)', 'type': 'headquarters location'}, {'source': 'London', 'target': 'Federal Reserve', 'type': 'headquarters location'}, {'source': 'New York (state)', 1 'target': 'Bank of America', 'type': 'headquarters location'}]}

Expected Output should be like:预期的 Output 应该是这样的: 预期输出应该是这样的:

Is this what you need?这是你需要的吗? You have not mentioned what the second dictionary is for since the sample output only refers to the first dictionary.您没有提到第二本词典的用途,因为示例 output 仅引用第一本词典。

inp = {'entities': [{'title': 'WarnerMedia', 'wikild': 'Q191715', 'label': 'Organization'}, 
                    {'title': 'Time (magazine)', 'wikild': 'Q43297', 'label': 'Organization'}, 
                    {'title': 'AOL', 'wikild': 'Q27585', 'label': 'Organization'}, 
                    {'title': 'Google', 'wikild': 'Q95', 'label': 'Organization'}, 
                    {'title': 'Warner Bros.', 'wikild': 'Q126399', 'label': 'Organization'}, 
                    {'title': 'U.S. Securities and Exchange Commission', 'wikild': 'Q953944', 'label': 'Organization'}
                   ], 
       'relations': [{'source': 'Time (magazine)', 'target': 'WarnerMedia', 'type': 'owned by'}, 
                     {'source': 'WarnerMedia', 'target': 'Time (magazine)', 'type': 'subsidiary'}, 
                     {'source': 'WarnerMedia', 'target': 'Time (magazine)', 'type': 'owned by'}, 
                     {'source': 'WarnerMedia', 'target': 'U.S. Securities and Exchange Commission', 'type': 'subsidiary'}, 
                     {'source': 'U.S. Securities and Exchange Commission', 'target': 'WarnerMedia', 'type': 'subsidiary'}, 
                     {'source': 'WarnerMedia', 'target': 'AOL', 'type': 'subsidiary'}, 
                     {'source': 'AOL', 'target': 'WarnerMedia', 'type': 'subsidiary'}
                    ]
      }

df = pd.DataFrame(inp['relations'])       #Simply conversion to dataframe
output = df[['source','type','target']]   #Reordering columns
output

在此处输入图像描述

I suppose the data come as a string, but I'm not sure if they come as one object or as multiple objects.我想数据是一个字符串,但我不确定它们是一个 object 还是多个对象。

In my answer, I suppose each time there is only an object if not;在我的回答中,我想每次只有一个 object 如果没有; then the only difference is having a for-loop appending the data.那么唯一的区别就是有一个for-loop附加数据。

import json
import pandas as pd

JSON="""
{
    'entities': 
    [
        {'title': 'WarnerMedia', 'wikild': 'Q191715', 'label': 'Organization'}, 
        {'title': 'Time (magazine)', 'wikild': 'Q43297', 'label': 'Organization'},
        {'title': 'AOL', 'wikild': 'Q27585', 'label': 'Organization'}, 
        {'title': 'Google', 'wikild': 'Q95', 'label': 'Organization'}, 
        {'title': 'Warner Bros.', 'wikild': 'Q126399', 'label': 'Organization'}, 
        {'title': 'U.S. Securities and Exchange Commission', 'wikild': 'Q953944', 'label': 'Organization'}
    ], 
    'relations': [
        {'source': 'Time (magazine)', 'target': 'WarnerMedia', 'type': 'owned by'}, 
        {'source': 'WarnerMedia', 'target': 'Time (magazine)', 'type': 'subsidiary'},
        {'source': 'WarnerMedia', 'target': 'Time (magazine)', 'type': 'owned by'}, 
        {'source': 'WarnerMedia', 'target': 'U.S. Securities and Exchange Commission', 'type': 'subsidiary'}, 
        {'source': 'U.S. Securities and Exchange Commission', 'target': 'WarnerMedia', 'type': 'subsidiary'}, 
        {'source': 'WarnerMedia', 'target': 'AOL', 'type': 'subsidiary'}, 
        {'source': 'AOL', 'target': 'WarnerMedia', 'type':'subsidiary'}
    ]
}
""".replace("'", '"')
json_object = json.loads(JSON)
df=pd.DataFrame(json_object["relations"])
df.head()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM