[英]All matches in a line : Spacy matcher
I am looking for a solution to print all the matching in a line using Spacy matcher我正在寻找一种使用 Spacy 匹配器在一行中打印所有匹配项的解决方案
The example goes like this, Here I am trying to extract experience.这个例子是这样的,在这里我试图提取经验。
doc = nlp("1+ years of experience in XX, 2 years of experiance in YY")
pattern = [{'POS': 'NUM'}, {'ORTH': '+', "OP": "?"}, {"LOWER": {"REGEX": "years?|months?"}}]
matcher = Matcher(nlp.vocab)
matcher.add("Skills", None, pattern)
matches = matcher(doc)
pirnt(doc[matches[0][1]:matches[0][2]]
Here I am getting output 1+ years
.在这里,我得到了
1+ years
输出。
But I am looking for a solution having output ['1+ years','2 years']
但我正在寻找具有输出
['1+ years','2 years']
的解决方案
You should specify the first item as 'LIKE_NUM': True
:您应该将第一项指定为
'LIKE_NUM': True
:
pattern = [{'LIKE_NUM': True}, {'ORTH': '+', "OP": "?"}, {"LOWER": {"REGEX": "(?:year|month)s?"}}]
I also contracted the years?|months?
我也承包了
years?|months?
to (?:year|month)s?
到
(?:year|month)s?
, you might even consider matching full token string using ^(?:year|month)s?$
, but that is not necessary at this point. ,您甚至可以考虑使用
^(?:year|month)s?$
匹配完整的令牌字符串,但这不是必需的。
Code:代码:
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
pattern = [{'LIKE_NUM': True}, {'ORTH': '+', "OP": "?"}, {"LOWER": {"REGEX": "(?:year|month)s?"}}]
matcher.add("Skills", None, pattern)
doc = nlp("1+ years of experience in XX, 2 years of experiance in YY")
matches = matcher(doc)
for _, start, end in matches:
print(doc[start:end].text)
Output:输出:
1+ years
2 years
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.