简体   繁体   English

spacy和displacy输出不同

[英]spacy and displacy outputs different

My sentence is: She had another chemotherapy protocol history with 5-FU alone before this protocol without any significant side effects. 我的判决是: She had another chemotherapy protocol history with 5-FU alone before this protocol without any significant side effects.

When I put this in displacy ( https://demos.explosion.ai/displacy/ ), the output contains a reference to 5-FU as a noun phrase. 当我把它置于替代( https://demos.explosion.ai/displacy/ )时,输出包含对5-FU作为名词短语的引用。

在此输入图像描述

However, when I annotate the text and search for noun chunks, I am not shown 5-FU as a noun chunk. 但是,当我注释文本并搜索名词块时,我没有将5-FU显示为名词块。

nlp = spacy.load('en') ax = nlp(mySentence) for w in ax.noun_chunks: print(w)

edit Additionally, when I search for the tags with the below code I am shown 5-FU as a NN. 编辑此外,当我使用以下代码搜索标签时,我将5-FU显示为NN。 If Spacy's annotation understands this singleton word as a noun surrounded by prepositions, why shouldn't the word be picked up as a noun phrase? 如果Spacy的注释将这个单例单词理解为由介词包围的名词,为什么不能将该单词作为名词短语拾取? end-edit 最终编辑

My spacy version: 我的spacy版本: 在此输入图像描述

What am I doing wrong? 我究竟做错了什么? Is there a version difference between displaCy and the version I am using? displaCy和我使用的版本之间是否有版本差异? Is there a spaCy help team to address this issue? 是否有spaCy帮助团队来解决这个问题?

Thanks much! 非常感谢!

Displacy does some pre-processing while showing the parse tree. 在显示解析树时,Displacy会进行一些预处理。 Here is a link to the parsing service ( built on spacy ) used by displacy : https://github.com/explosion/spacy-services/blob/master/displacy/displacy_service/parse.py#L25 以下是displacy使用的解析服务(基于spacy构建)的链接: https//github.com/explosion/spacy-services/blob/master/displacy/displacy_service/parse.py#L25

if collapse_phrases:
    for np in list(self.doc.noun_chunks):
        np.merge(np.root.tag_, np.root.lemma_, np.root.ent_type_)

Spacy merges the noun chunks in the sentence instead of treating them as seperate tokens, this is why your output is different. Spacy在句子中合并名词块而不是将它们视为单独的标记,这就是为什么你的输出是不同的。

在此输入图像描述

The other difference would be the models that you use. 另一个区别是您使用的模型 You might be using the smallest en_core_web_sm whereas Spacy might be using the bigger en_core_web_md ( though it is not mentioned officially anywhere ) 您可能正在使用最小的en_core_web_sm,而Spacy可能正在使用更大的en_core_web_md(尽管在任何地方都没有正式提及)

I am trying to solve the same problem. 我正在努力解决同样的问题。 DisplayCy and SpaCy outputs are different (both POS tags and relationships between words). DisplayCy和SpaCy输出不同(POS标签和单词之间的关系)。

It doesn't look like the pre-processing merging is to blame as you can disable that in DisplayCy - Settings > Collapse Phrases - for me the output still doesn't match. 它看起来不像预处理合并是因为你可以在DisplayCy中禁用它 - 设置>折叠短语 - 对我来说输出仍然不匹配。

It's possible that you need to use en_core_web_md model (not en_core_web_sm): 您可能需要使用en_core_web_md模型(不是en_core_web_sm):

python -m spacy download en_core_web_md

However I haven't tested that yet. 但是我还没有测试过。

Because they upgrades to V2.0 I faced the similar problem.Then i moved to V2.0 To install a model, you'll have to download it with its full name, using the --direct flag: 因为他们升级到V2.0我遇到了类似的问题。然后我转移到V2.0要安装模型,你必须使用--direct标志下载它的全名:

python -m spacy download en_core_web_sm-2.0.0-alpha --direct   # English
python -m spacy download xx_ent_wiki_sm-2.0.0-alpha --direct   # Multi-language NER

You can load a model by calling spaCy's loader. 您可以通过调用spaCy的加载器来加载模型。 eg nlp = spacy.load('en_core_web_sm') , or import it as a module ( import en_core_web_sm ) and call its load() method, .eg nlp = en_core_web_sm.load() . 例如nlp = spacy.load('en_core_web_sm') ,或者将其作为模块import en_core_web_smimport en_core_web_sm )并调用其load()方法, nlp = en_core_web_sm.load()

Follow the Documentation at https://github.com/explosion/spaCy/releases/tag/v2.0.0-alpha 请访问https://github.com/explosion/spaCy/releases/tag/v2.0.0-alpha上的文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM