[英]How to select only first entity extracted from spacy entities?
I am trying to using following code to extract entities from text available in DataFrame.我正在尝试使用以下代码从 DataFrame 中可用的文本中提取实体。
for i in df['Text'].to_list():
doc = nlp(i)
for entity in doc.ents:
if entity.label_ == 'GPE':
I need to store text of first GPE
with it's corresponding column of text.我需要存储第一个
GPE
的文本及其对应的文本列。 Like for instance if following is text at index 0 in column df['Text']
例如,如果以下是
df['Text']
列中索引 0 处的文本
Match between USA and Canada was postponed
美国和加拿大的比赛被推迟
then I need only first location(USA) in another column such as df['Place']
at the corresponding index to Text which is 0. df['Place']
is not already available in DataFrame means it will be created while assigning value.然后我只需要在另一列中的第一个位置(USA),例如
df['Place']
在对应的 Text 索引处,该索引为 0。 df['Place']
在 DataFrame 中尚不可用意味着它将在分配值时创建. I have tried following code.我试过下面的代码。 But it fills whole column with very first value it can find.
但是它会用它能找到的第一个值填充整列。
for i in df['Text'].to_list():
doc = nlp(i)
for entity in doc.ents:
if entity.label_ == 'GPE':
df['Place'] = (entity.text)
I have also tried to append text in list with e_list.append((entity.text))
but it will append all entities it can find in text.我也尝试过使用
e_list.append((entity.text))
列表中的文本,但它将 append 它可以在文本中找到的所有实体。 Can someone help that how can I store only first entity only at corresponding index.有人可以帮助我如何只在相应的索引处存储第一个实体。 Thank you
谢谢
You can get all the entities per each entry using Series.apply
on the Text
column like您可以在
Text
列上使用Series.apply
获取每个条目的所有实体,例如
df['Place'] = df['Text'].apply(lambda x: [entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'])
If you are only interested in getting the first entity only from each entry use如果您只想从每个条目中获取第一个实体,请使用
df['Text'].apply(lambda x: ([entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'] or [''])[0])
Here is a test snippet:这是一个测试片段:
import spacy
import pandas as pd
df = pd.DataFrame({'Text':['Match between USA and Canada was postponed', 'No ents']})
df['Text'].apply(lambda x: [entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'])
# => 0 [USA, Canada]
# 1 []
# Name: Text, dtype: object
df['Text'].apply(lambda x: ([entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'] or [''])[0])
# => 0 USA
# 1
# Name: Text, dtype: object
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.