使用 SpaCy 和 python lambda 提取命名实体

Question

I am using following code to extract Named Entities using lambda.我正在使用以下代码使用 lambda 提取命名实体。

df['Place'] = df['Text'].apply(lambda x: [entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'])

and和

df['Text'].apply(lambda x: ([entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'] or [''])[0])

For a few hundred records it can extract results.对于几百条记录，它可以提取结果。 But when it comes to thousands of records.但是当涉及到数千条记录时。 It takes pretty much forever.这需要很长时间。 Can someone help me to optimize this line of code?有人可以帮我优化这行代码吗？

Answer 1

You may improve by:您可以通过以下方式改进：

Calling nlp.pipe on the whole list of documents在整个文档列表上调用nlp.pipe
Disabling unnecessary pipes.禁用不必要的管道。

Try:尝试：

import spacy
nlp = spacy.load("en_core_web_md", disable = ["tagger","parser"])

df = pd.DataFrame({"Text":["this is a text about Germany","this is another about Trump"]})

texts = df["Text"].to_list()
ents = []
for doc in nlp.pipe(texts):
    for ent in doc.ents:
        if ent.label_ == "GPE":
            ents.append(ent)
            
print(ents)

[Germany]

使用 SpaCy 和 python lambda 提取命名实体

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-01-01 12:03:10

使用 SpaCy 和 python lambda 提取命名实体

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-01-01 12:03:10

解决方案1
1 已采纳 2021-01-01 12:03:10