Spacy Phrase Matcher 空間敏感問題

Question

terms = ["Barack Obama", "Angela Merkel", "Washington, D.C."]
doc = nlp("German Chancellor Angela Merkel and US President Barack  Obama "
      "converse in the Oval Office inside the White House in Washington, D.C.")

如果我在“Barack Obama”這兩個詞之間輸入一個額外的空格，則短語匹配器將不起作用，因為它對空格敏感。 有沒有辦法克服這個空間敏感問題？

操作系統：Windows 8
使用的 Python 版本：3.7
使用的 spaCy 版本：2.2.3
環境信息：康達

Answer 1

import re
re.sub(' +',' ', "barack    obama")

#op
'barack obama'

參考文檔https://spacy.io/api/phrasematcher

import en_core_web_sm
nlp = en_core_web_sm.load()

matcher = PhraseMatcher(nlp.vocab)
matcher.add("OBAMA", None, nlp("Barack Obama"))

doc = nlp("Barack Obama urges Congress to find courage to defend his healthcare reforms")
matches = matcher(doc)

#op
[(7732777389095836264, 0, 2)]

但是當字符串之間有多個空格時，它將返回空列表。 即巴拉克奧巴馬之間有多個空格

doc = nlp("Barack   Obama urges Congress to find courage to defend his 
healthcare reforms")
print(matcher(doc))
#op
[]

為了解決這個問題，我想從給定的字符串中刪除額外的空間

string_=  'Barack   Obama urges Congress to find courage to defend his healthcare reforms'

space_removed_string = re.sub(' +',' ', string_)

#now passing the string in model
doc = nlp(space_removed_string)
print(matcher(doc))

#op
[(7732777389095836264, 0, 2)]

Spacy Phrase Matcher 空間敏感問題

問題描述

1 個解決方案

解決方案1
0 2020-01-20 05:39:54

Spacy Phrase Matcher 空間敏感問題

問題描述

1 個解決方案

解決方案1 0 2020-01-20 05:39:54

解決方案1
0 2020-01-20 05:39:54