简体   繁体   中英

Pandas: how to create a new dataframe depending on a column value containing json for each row?

I have a dataframe like this:

artid  link     ner_label
1      url1     "{('blanqui', 'Person'): 6, ('walter benjamin', 'Person'): 2}"
2      url2     "{('john', 'Person'): 8, ('steven', 'Person'): 3}"

The type of each row of ner_label is string. I would like to have this:

artid   link     ner                label      score
    1   url1     'blanqui'         'Person'     6
    1   url1     'walter benjamin' 'Person'     2
    2   url2     'john'            'Person'     8
    2   url2     'steven'          'Person'     3   

How can I do this? I really don't know how to do this.

Not the most efficient way but it will do the job for you

from ast import literal_eval

df['ner'] = df['ner_label'].apply(lambda x: list(literal_eval(x).keys()))
df['score'] = df['ner_label'].apply(lambda x: list(literal_eval(x).values()))

df = df.set_index(['artid', 'link', 'ner_label']).apply(pd.Series.explode).reset_index()

df['label'] = [i[1] for i in df['ner']]
df['ner'] = [i[0] for i in df['ner']]
df.drop(['ner_label'], axis=1, inplace=True)

Output:

  artid     link    ner             score   label
0   1      url1     blanqui            6    Person
1   1      url1     walter benjamin    2    Person
2   2      url2     john               8    Person
3   2      url2     steven             3    Person

Here is the solution with only pandas

df = df.assign(ner_label=df['ner_label'].str.split(', \(')).explode('ner_label')
df['ner_label']= df['ner_label'].str.replace('(','').str.replace('\)','').\
str.replace('\{','').str.replace('\}','').str.replace('\"','')

df[['ner','score']] = df.ner_label.str.split(':', expand=True)

df[['ner','label']] = df.ner.str.split(',', expand=True)

df.drop(columns='ner_label', inplace=True)

Output:

    artid   link    ner score   label
0   1   url1    'blanqui'   6   'Person'
0   1   url1    'walter benjamin'   2   'Person'
1   2   url2    'john'  8   'Person'
1   2   url2    'steven'    3   'Person'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM