简体   繁体   中英

How to split contents of column into different columns in csv files using python?

I have a CSV file that has the output from my machine learning model. It should ideally have three columns ( Source, Relation type, Target). When extracting the output my outputs are being stored as a single content of the cell for n number of rows. I do not want the entities, I want the content of relations in separate columns.
I have attached my output and also my expected output.
Can anyone please guide me on extracting the contents of the cell into different columns using python.

{'entities': [{'title': 'WarnerMedia', 'wikild': 'Q191715', 'label': 'Organization'}, {'title': 'Time (magazine)', 'wikild': 'Q43297', 'label': 'Organization'}, {'title': 'AOL', 'wikild': 'Q27585', 'label': 'Organization'}, {'title': 'Google', 'wikild': 'Q95', 'label': 'Organization'}, {'title': 'Warner Bros.', 'wikild': 'Q126399', 'label': 'Organization'}, {'title': 'U.S. Securities and Exchange Commission', 'wikild': 'Q953944', 'label': 'Organization'}], 'relations': [{'source': 'Time (magazine)', 'target': 'WarnerMedia', 'type': 'owned by'}, {'source': 'WarnerMedia', 'target': 'Time (magazine)', 'type': 'subsidiary'}, {'source': 'WarnerMedia', 'target': 'Time (magazine)', 'type': 'owned by'}, {'source': 'WarnerMedia', 'target': 'U.S. Securities and Exchange Commission', 'type': 'subsidiary'}, {'source': 'U.S. Securities and Exchange Commission', 'target': 'WarnerMedia', 'type': 'subsidiary'}, {'source': 'WarnerMedia', 'target': 'AOL', 'type': 'subsidiary'}, {'source': 'AOL', 'target': 'WarnerMedia', 'type': O 'subsidiary'}]}
{'entities': [{'title': 'Europe', 'wikild': 'Q46', 'label': 'Location'}, {'title': 'London', 'wikild': 'Q84', 'label': 'Organization'}, {'title': 'Federal Reserve', 'wikild': 'Q53536', 'label': 'Organization'}, {'title': 'United States', 'wikild': 'Q30', 'label': 'Organization'}, {'title': 'Federal government of the United States', 'wikild': 'Q48525', 'label': 'Organization'}, {'title': 'Bank of America', 'wikild': 'Q487907', 'label': 'Organization'}, {'title': 'Group of Seven', 'wikild': 'Q1764511', 'label': 'Organization'}, {'title': 'United States dollar', 'wikild': 'Q4917', 'label': 'Organization'}, {'title': 'New York (state)', 'wikild': 'Q1384', 'label': 'Organization'}, {'title': 'Alan Greenspan', 'wikild': 'Q193635', 'label': 'Person'}, {'title': 'Euro', 'wikild': 'Q4916', 'label': 'Organization'}, {'title': 'Germany', 'wikild': 'Q183', 'label': 'Organization'}], 'relations': [{'source': 'Federal Reserve', 'target': 'London', 'type': 'headquarters location'}, {'source': 'Bank of America', 'target': 'New York (state)', 'type': 'headquarters location'}, {'source': 'London', 'target': 'Federal Reserve', 'type': 'headquarters location'}, {'source': 'New York (state)', 1 'target': 'Bank of America', 'type': 'headquarters location'}]}

Expected Output should be like: 预期输出应该是这样的:

Is this what you need? You have not mentioned what the second dictionary is for since the sample output only refers to the first dictionary.

inp = {'entities': [{'title': 'WarnerMedia', 'wikild': 'Q191715', 'label': 'Organization'}, 
                    {'title': 'Time (magazine)', 'wikild': 'Q43297', 'label': 'Organization'}, 
                    {'title': 'AOL', 'wikild': 'Q27585', 'label': 'Organization'}, 
                    {'title': 'Google', 'wikild': 'Q95', 'label': 'Organization'}, 
                    {'title': 'Warner Bros.', 'wikild': 'Q126399', 'label': 'Organization'}, 
                    {'title': 'U.S. Securities and Exchange Commission', 'wikild': 'Q953944', 'label': 'Organization'}
                   ], 
       'relations': [{'source': 'Time (magazine)', 'target': 'WarnerMedia', 'type': 'owned by'}, 
                     {'source': 'WarnerMedia', 'target': 'Time (magazine)', 'type': 'subsidiary'}, 
                     {'source': 'WarnerMedia', 'target': 'Time (magazine)', 'type': 'owned by'}, 
                     {'source': 'WarnerMedia', 'target': 'U.S. Securities and Exchange Commission', 'type': 'subsidiary'}, 
                     {'source': 'U.S. Securities and Exchange Commission', 'target': 'WarnerMedia', 'type': 'subsidiary'}, 
                     {'source': 'WarnerMedia', 'target': 'AOL', 'type': 'subsidiary'}, 
                     {'source': 'AOL', 'target': 'WarnerMedia', 'type': 'subsidiary'}
                    ]
      }

df = pd.DataFrame(inp['relations'])       #Simply conversion to dataframe
output = df[['source','type','target']]   #Reordering columns
output

在此处输入图像描述

I suppose the data come as a string, but I'm not sure if they come as one object or as multiple objects.

In my answer, I suppose each time there is only an object if not; then the only difference is having a for-loop appending the data.

import json
import pandas as pd

JSON="""
{
    'entities': 
    [
        {'title': 'WarnerMedia', 'wikild': 'Q191715', 'label': 'Organization'}, 
        {'title': 'Time (magazine)', 'wikild': 'Q43297', 'label': 'Organization'},
        {'title': 'AOL', 'wikild': 'Q27585', 'label': 'Organization'}, 
        {'title': 'Google', 'wikild': 'Q95', 'label': 'Organization'}, 
        {'title': 'Warner Bros.', 'wikild': 'Q126399', 'label': 'Organization'}, 
        {'title': 'U.S. Securities and Exchange Commission', 'wikild': 'Q953944', 'label': 'Organization'}
    ], 
    'relations': [
        {'source': 'Time (magazine)', 'target': 'WarnerMedia', 'type': 'owned by'}, 
        {'source': 'WarnerMedia', 'target': 'Time (magazine)', 'type': 'subsidiary'},
        {'source': 'WarnerMedia', 'target': 'Time (magazine)', 'type': 'owned by'}, 
        {'source': 'WarnerMedia', 'target': 'U.S. Securities and Exchange Commission', 'type': 'subsidiary'}, 
        {'source': 'U.S. Securities and Exchange Commission', 'target': 'WarnerMedia', 'type': 'subsidiary'}, 
        {'source': 'WarnerMedia', 'target': 'AOL', 'type': 'subsidiary'}, 
        {'source': 'AOL', 'target': 'WarnerMedia', 'type':'subsidiary'}
    ]
}
""".replace("'", '"')
json_object = json.loads(JSON)
df=pd.DataFrame(json_object["relations"])
df.head()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM