如何使用 SpaCy 从 Pandas DataFrame 中提取命名实体

Question

I am trying to extract Named Entities using first answer to this question and code is as following我正在尝试使用此问题的第一个答案来提取命名实体，代码如下

for i in df['Article'].to_list():
    doc = nlp(i)
    for entity in doc.ents:
        print((entity.text))

But it is not printing entities.但它不是打印实体。 I have tried print(i) and print(doc) both variables have values and df['Article'] contains news text.我试过print(i)和print(doc)两个变量都有值，并且df['Article']包含新闻文本。 Can someone help with why second loop is not extracting entities?有人可以帮助解释为什么第二个循环不提取实体吗？ Thank you谢谢

EDIT:编辑：
This is dataset file, please run following code to form preprocessing that I have done.这是数据集文件，请运行以下代码以形成我所做的预处理。

df.iloc[:,0].dropna(inplace=True)
df = df[df.iloc[:,0].notna()]

to remove special characters from df['Articles']从df['Articles']中删除特殊字符

df['Article'] = df['Article'].map(lambda x: re.sub(r'\W+', '', x))

Answer 1

With df['Article'].map(lambda x: re.sub(r'\W+', '', x)) , you remove all whitespace chars from your articles.使用df['Article'].map(lambda x: re.sub(r'\W+', '', x)) ，您可以从文章中删除所有空白字符。

You need to use你需要使用

df['Article'] = df['Article'].str.replace(r'(?:_|[^\w\s])+', '')

With that regex, you will only remove special chars other than whitespaces.使用该正则表达式，您只会删除除空格以外的特殊字符。

如何使用 SpaCy 从 Pandas DataFrame 中提取命名实体

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-12-18 18:18:20

如何使用 SpaCy 从 Pandas DataFrame 中提取命名实体

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-12-18 18:18:20

解决方案1
1 已采纳 2020-12-18 18:18:20