简体   繁体   English

使用 SpaCy 和 python lambda 提取命名实体

[英]Extract Named Entities using SpaCy and python lambda

I am using following code to extract Named Entities using lambda.我正在使用以下代码使用 lambda 提取命名实体。

df['Place'] = df['Text'].apply(lambda x: [entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'])

and

df['Text'].apply(lambda x: ([entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'] or [''])[0])

For a few hundred records it can extract results.对于几百条记录,它可以提取结果。 But when it comes to thousands of records.但是当涉及到数千条记录时。 It takes pretty much forever.这需要很长时间。 Can someone help me to optimize this line of code?有人可以帮我优化这行代码吗?

You may improve by:您可以通过以下方式改进:

  1. Calling nlp.pipe on the whole list of documents在整个文档列表上调用nlp.pipe
  2. Disabling unnecessary pipes.禁用不必要的管道。

Try:尝试:

import spacy
nlp = spacy.load("en_core_web_md", disable = ["tagger","parser"])

df = pd.DataFrame({"Text":["this is a text about Germany","this is another about Trump"]})

texts = df["Text"].to_list()
ents = []
for doc in nlp.pipe(texts):
    for ent in doc.ents:
        if ent.label_ == "GPE":
            ents.append(ent)
            
print(ents)

[Germany]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM