简体   繁体   English

Spacy依赖匹配器模式不返回匹配项

[英]Spacy dependencymatcher pattern not returning matches

I am trying to create, add and get results from a pattern using spacy DependencyMatcher.我正在尝试使用 spacy DependencyMatcher 从模式中创建、添加和获取结果。

I created a pattern for the sentence: "From Monday to Friday"我为句子创建了一个模式:“从星期一到星期五”

The full pattern:完整模式:

pattern = [
    {
        "RIGHT_ID": "node0",
        "RIGHT_ATTRS": {'DEP': 'ROOT', 'POS': 'ADP', 'TAG': 'IN'}
    },
    {
        "LEFT_ID": "node0",
        "REL_OP": ">",
        "RIGHT_ID": "node1",
        "RIGHT_ATTRS": {'DEP': 'pobj', 'POS': 'PROPN', 'TAG': 'NNP'},
    },
    {
        "LEFT_ID": "node1",
        "REL_OP": "$--",
        "RIGHT_ID": "node2",
        "RIGHT_ATTRS": {'DEP': 'prep', 'POS': 'ADP', 'TAG': 'IN'},
    },
       {
        "LEFT_ID": "node2",
        "REL_OP": ">",
        "RIGHT_ID": "node3",
        "RIGHT_ATTRS":{'DEP': 'pobj', 'POS': 'PROPN', 'TAG': 'NNP'},
    },
    
]

The simpler pattern is:更简单的模式是:

pattern = [
    {
        "RIGHT_ID": "node0",
        "RIGHT_ATTRS": {"POS": "ADP"}
    },
    {
        "LEFT_ID": "node0",
        "REL_OP": ">",
        "RIGHT_ID": "node1",
        "RIGHT_ATTRS": {"POS": "PROPN"},
    },
    {
        "LEFT_ID": "node1",
        "REL_OP": "$--",
        "RIGHT_ID": "node2",
        "RIGHT_ATTRS": {"POS": "ADP"},
    },
       {
        "LEFT_ID": "node2",
        "REL_OP": ">",
        "RIGHT_ID": "node3",
        "RIGHT_ATTRS":{'POS': 'PROPN'},
    },
    
]

在此处输入图像描述

My question is, why is this pattern not giving any matches, not on the full or simpler pattern?我的问题是,为什么这个模式没有给出任何匹配,而不是完整或更简单的模式?

import spacy
from spacy.matcher import DependencyMatcher


nlp = spacy.load("en_core_web_sm")
matcher = DependencyMatcher(nlp.vocab)


text="From monday to friday"
doc = nlp(text)
matcher.add("pattern1", [pattern])

matches = matcher(doc)

# Each token_id corresponds to one pattern dict
match_id, token_ids = matches[0]

spacy versions:空间版本:

spaCy v3.0.6 spaCy v3.0.6

NAME SPACY VERSION命名空间版本

en_core_web_sm >=3.0.0,<3.1.0 3.0.0 ✔ en_core_web_sm >=3.0.0,<3.1.0 3.0.0 ✔

Your REL_OP for node2 is backwards.您的node2REL_OP是向后的。 It should be $++ .它应该是$++


To give a full explanation, this code works for me.为了给出完整的解释,这段代码对我有用。

import spacy

from spacy.matcher import DependencyMatcher

nlp = spacy.load("en_core_web_sm")
matcher = DependencyMatcher(nlp.vocab)

text="From Monday to Friday"
doc = nlp(text)

pattern = [
    {
        "RIGHT_ID": "node0",
        "RIGHT_ATTRS": {'POS': 'ADP', 'TAG': 'IN'}
    },
    {
        "LEFT_ID": "node0",
        "REL_OP": ">",
        "RIGHT_ID": "node1",
        "RIGHT_ATTRS": {'POS': 'PROPN'},
    },
    {
        "LEFT_ID": "node1",
        "REL_OP": "$++",
        "RIGHT_ID": "node2",
        "RIGHT_ATTRS": {'POS': 'ADP'},
    },
       {
        "LEFT_ID": "node2",
        "REL_OP": ">",
        "RIGHT_ID": "node3",
        "RIGHT_ATTRS":{'POS': 'PROPN'},
    },
    
]

matcher.add("pattern1", [pattern])

matches = matcher(doc)
print(matches)

print("-----")
# this part is just for reference
for word in doc:
    print(word.pos_, word.tag_, word.dep_, word, sep="\t")

Couple of points about this:关于这一点的几点:

  • your second pattern is better, you shouldn't need to specify tag and pos for English (tag determines pos)您的第二种模式更好,您不需要为英语指定标签和位置(标签确定位置)
  • In the v3 small model "monday" and "friday" are not proper nouns unless capitalized (it looks like your displaCy output is from the public demo, which uses v2)在 v3 小 model 中,“星期一”和“星期五”不是专有名词,除非大写(看起来您的显示 output 来自公共演示,它使用 v2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM