如何 select 仅从 spacy 实体中提取第一个实体？

Question

I am trying to using following code to extract entities from text available in DataFrame.我正在尝试使用以下代码从 DataFrame 中可用的文本中提取实体。

for i in df['Text'].to_list():

    doc = nlp(i)
    for entity in doc.ents:
        if entity.label_ == 'GPE':

I need to store text of first GPE with it's corresponding column of text.我需要存储第一个GPE的文本及其对应的文本列。 Like for instance if following is text at index 0 in column df['Text']例如，如果以下是df['Text']列中索引 0 处的文本

Match between USA and Canada was postponed美国和加拿大的比赛被推迟

then I need only first location(USA) in another column such as df['Place'] at the corresponding index to Text which is 0. df['Place'] is not already available in DataFrame means it will be created while assigning value.然后我只需要在另一列中的第一个位置（USA），例如df['Place']在对应的 Text 索引处，该索引为 0。 df['Place']在 DataFrame 中尚不可用意味着它将在分配值时创建. I have tried following code.我试过下面的代码。 But it fills whole column with very first value it can find.但是它会用它能找到的第一个值填充整列。

for i in df['Text'].to_list():

    doc = nlp(i)
    for entity in doc.ents:
        if entity.label_ == 'GPE':
            df['Place'] = (entity.text)

I have also tried to append text in list with e_list.append((entity.text)) but it will append all entities it can find in text.我也尝试过使用e_list.append((entity.text))列表中的文本，但它将 append 它可以在文本中找到的所有实体。 Can someone help that how can I store only first entity only at corresponding index.有人可以帮助我如何只在相应的索引处存储第一个实体。 Thank you谢谢

Answer 1

You can get all the entities per each entry using Series.apply on the Text column like您可以在Text列上使用Series.apply获取每个条目的所有实体，例如

df['Place'] = df['Text'].apply(lambda x: [entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'])

If you are only interested in getting the first entity only from each entry use如果您只想从每个条目中获取第一个实体，请使用

df['Text'].apply(lambda x: ([entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'] or [''])[0])

Here is a test snippet:这是一个测试片段：

import spacy
import pandas as pd
df = pd.DataFrame({'Text':['Match between USA and Canada was postponed', 'No ents']})
df['Text'].apply(lambda x: [entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'])
# => 0    [USA, Canada]
#    1               []
#    Name: Text, dtype: object
df['Text'].apply(lambda x: ([entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'] or [''])[0])
# => 0    USA
#    1       
#    Name: Text, dtype: object

如何 select 仅从 spacy 实体中提取第一个实体？

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-12-22 10:43:03

如何 select 仅从 spacy 实体中提取第一个实体？

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-12-22 10:43:03

解决方案1
1 已采纳 2020-12-22 10:43:03