从 spacy 对象中删除命名实体

Question

I am trying to remove Named Entities from the document using Spacy.我正在尝试使用 Spacy 从文档中删除命名实体。 I didn't find any troubles to recognize the named entities.我没有发现任何识别命名实体的麻烦。 used this code:使用此代码：

ne = [(ent.text, ent.label_) for ent in doc.ents]
print(ne)
persons = [ent.text for ent in doc.ents if ent.label_ == 'PERSON']
print(persons)

Output:输出：

'Timothy D. Cook',
 'Peter',
 'Peter',
 'Benjamin A. Reitzes',
 'Timothy D. Cook',
 'Steve Milunovich',
 'Steven Mark Milunovich',
 'Peter',
 'Luca Maestri'

But then I am trying to use this chunk to actually remove them from the document:但是后来我尝试使用这个块从文档中实际删除它们：

text_no_namedentities = []

ents = [e.text for e in doc.ents]
for item in doc:
    if item.text in ents:
        pass
    else:
        text_no_namedentities.append(item.text)
print(" ".join(text_no_namedentities))

It does not work, since the NE are n-grams.它不起作用，因为 NE 是 n-gram。 And if I just check the contents of a little chunk of spacy object it is as follows:如果我只是检查一小块 spacy 对象的内容，则如下所示：

for item in doc:
    print(item.text)

iPad
has
a
78
%
Steve
Milunovich
share
of
the
U.S.
commercial
tablet
market

So the spacy object is tokenized.所以 spacy 对象被标记化。 Hence I can't remove the NEs with my code above.因此，我无法使用上面的代码删除 NE。 Any ideas on how I can remove all the named entities from the object?关于如何从对象中删除所有命名实体的任何想法？

Answer 1

The condition you want to check on is您要检查的条件是

if item.ent_type:

This will evaluate to True if the item ("token") is part of a named entity.如果item （“令牌”）是命名实体的一部分，这将评估为True 。 token.ent_type will be a hash ID of the actual type of the entity, which you can query with token.ent_type_ (note the _). token.ent_type将是实体实际类型的哈希 ID，您可以使用token.ent_type_ （注意 _）进行查询。

This would be the code I'd use:这将是我将使用的代码：

    text_no_namedentities = ""
    for token in doc:
        if not token.ent_type:
            text_no_namedentities += token.text
            if token.whitespace_:
                text_no_namedentities += " "

Note that you can use token.whitespace_ to determine whether or not the original token in the original sentence was followed by a space or not.请注意，您可以使用token.whitespace_来确定原始句子中的原始标记后面是否有空格。

For more information, see the docs on Token here .有关更多信息，请参阅此处的Token文档。

FYI - for the future, it would be more convenient to include a working minimal snippet of your code, instead of just parts of it.仅供参考 - 对于未来，包含代码的最小片段会更方便，而不仅仅是其中的一部分。

从 spacy 对象中删除命名实体

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-02-23 18:41:18

从 spacy 对象中删除命名实体

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-02-23 18:41:18

解决方案1
1 已采纳 2020-02-23 18:41:18