简体   繁体   English

使用Spacy NLP使用多行命名实体识别

[英]Named entity recognization using multiple lines using Spacy NLP

In Spacy NLP, I am not able to get exact output for named entity. 在Spacy NLP中,我无法获得命名实体的确切输出。 My string value is on multiple lines. 我的字符串值在多行上。 Please check below code: 请检查以下代码:

from spacy import displacy
from collections import Counter
import en_core_web_sm
nlp = en_core_web_sm.load()
m = (u"""Release the container 6th August

USG11223
USG12224
USG21113""")

doc = nlp(m)
print([(X.text, X.label_) for X in doc.ents])

OUTPUT: [('6th August', 'DATE')] 输出: [('6th August', 'DATE')]

But i want output like 但是我想要输出像

['USG11223', 'USG12224', 'USG21113',6th August]

One thing that most people do not realize about Named Entity Recognition in libraries like Spacy, AllenNLP, etc, is that it is usually a Machine Learning model trained on a general corpus for general entities. 大多数人并不像Spacy,AllenNLP等库实现对命名实体识别的一件事是,它通常是受过训练的一般实体一般性语料库 机器学习模型

Your data is from a specific context , where strings like "USG11223" have some special meaning. 您的数据来自特定的上下文 ,其中“ USG11223”之类的字符串具有某些特殊含义。 However, in general context, your string is no more than a random combination of letters and numbers and might even be discarded by the model preprocessing. 但是,在一般情况下,您的字符串只不过是字母和数字的随机组合,甚至可能被模型预处理丢弃。

If you want the NER to recognize your tags as entities, you can train your own model to be able to recognize these tokens as entities, but you would have to provide several examples. 如果希望NER将标签识别为实体,则可以训练自己的模型以将这些标记识别为实体,但是您必须提供几个示例。 You can learn more about how to do it here: https://spacy.io/usage/training/ 您可以在此处了解有关如何执行此操作的更多信息: https : //spacy.io/usage/training/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM