i have a dataframe as below
id | text |
---|---|
1 | aaaa |
2 | bbbb |
i read the above to a dataframe and i need to convert the text column to a list for perform NER extraction
tags = []
for i in df['text'].tolis(():
tdoc = nlp(i)
for tags in tdoc.ents:
tags.append((df.id,tags.text,tags.label_))
Above works and i get the NER tags which i would like to export to dataframe along with the 'id' column from the dataframe
df_tag = pd.DataFrame_from_records(tags, columns = ['id', 'name', 'type'])
The problem here is my id columns repeats as below
id | name | type |
---|---|---|
1 2 | NER A | Type A |
1 2 | NER B | Type B |
Desired output
id | name | type |
---|---|---|
1 | NER A | Type A |
2 | NER B | Type B |
The problem comes from the fact that df.id
returns a Series, from which you are repeatedly appending the index, not the values.
Also, lines 4 and 5, it should be tag
, not tags
.
Try like this:
tags = []
for i in df['text'].tolist():
tdoc = nlp(i)
for tag in tdoc.ents:
tags.append((df.id.values,tag.text,tag.label_))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.