簡體   English   中英

跨令牌的帶有正則表達式的 Spacy 匹配器

[英]Spacy matcher with regex across tokens

我有以下句子:

phrases = ['children externalize their emotions through outward behavior',
         'children externalize hidden emotions.',
         'children externalize internalized emotions.',
         'a child might externalize a hidden emotion through misbehavior',
         'a kid might externalize some emotions through behavior',
         'traumatized children externalize their hidden trauma through bad behavior.',
         'The kid is externalizing internal traumas',
         'A child might externalize emotions though his outward behavior',
         'The kid externalized a lot of his emotions through misbehavior.']

我想抓住動詞externalize之后出現的任何名詞 外化、外化等

在這種情況下; 我們應該得到:

externalize their emotions
externalize hidden emotions
externalize internalized emotions
externalize a hidden emotion
externalize some emotions
externalize their hidden trauma
externalizing internal traumas
externalized a lot of his emotions

到目前為止,如果名詞出現在動詞外部化之后,我只能抓住名詞

我想抓住這個名詞; 如果它恰好在少於 15 個字符之后。 例如:外化很多應該匹配的情緒; 因為(他的很多)只有 14 個字符; 計算空格。

這是我的工作,遠非完美。

import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher =  Matcher(vocab = nlp.vocab)
verb_noun = [{'POS':'VERB'}, {'POS':'NOUN'}]
matcher.add('verb_noun', None, verb_noun)

list_result = []
for phrase in phrases:
    doc = nlp(phrase)
    doc_match = matcher(doc)
    if doc_match:
        for match in doc_match:
            start = match[1]
            end = match[2]
            result = doc[start:end]
            result = [i.lemma_ for i in result]
            if 'externaliz' in result[0].lower():
                result = ' '.join(result)
                list_result.append(result)

我想抓住這個名詞; 如果它恰好在少於 15 個字符之后。 例如:外化很多應該匹配的情緒; 因為(他的很多)只有 14 個字符; 計算空格。

您可以這樣做,但我不建議這樣做。 您應該做的是編寫一個正則表達式來匹配字符串並使用Doc.char_span創建一個匹配。 由於 Matcher 處理令牌,因此無法合理實現使用“14 個字符,包括空格”之類的啟發式方法。 此外,這種啟發式方法是一種黑客行為,並且會不穩定地執行。

我懷疑你真正想做的是弄清楚被外化的是什么,即找到動詞的賓語。 在這種情況下,您應該使用DependencyMatcher 這是一個使用簡單規則並合並名詞塊的示例:

import spacy

from spacy.matcher import DependencyMatcher
nlp = spacy.load("en_core_web_sm")

texts = ['children externalize their emotions through outward behavior',
         'children externalize hidden emotions.',
         'children externalize internalized emotions.',
         'a child might externalize a hidden emotion through misbehavior',
         'a kid might externalize some emotions through behavior',
         'traumatized children externalize their hidden trauma through bad behavior.',
         'The kid is externalizing internal traumas',
         'A child might externalize emotions though his outward behavior',
         'The kid externalized a lot of his emotions through misbehavior.']

pattern = [
  {
    "RIGHT_ID": "externalize",
    "RIGHT_ATTRS": {"LEMMA": "externalize"}
  },
  {
    "LEFT_ID": "externalize",
    "REL_OP": ">",
    "RIGHT_ID": "object",
    "RIGHT_ATTRS": {"DEP": "dobj"}
  },
]

matcher = DependencyMatcher(nlp.vocab)
matcher.add("EXTERNALIZE", [pattern])

# what was externalized?

# this is optional: merge noun phrases
nlp.add_pipe("merge_noun_chunks")

for doc in nlp.pipe(texts):
    for match_id, tokens in  matcher(doc):
        # tokens[0] is like "externalize"
        print(doc[tokens[1]])

輸出:

their emotions
hidden emotions
internalized emotions
a hidden emotion
some emotions
their hidden trauma
internal traumas
emotions
his outward behavior
a lot

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM