spacy doc.merge 使用 retokenizer

Question

我想將以下代碼轉換為使用新的spacy rekonizer. . 但是我不確定 go 到底是怎么做的。

>>> import spacy
>>> nlp = spacy.load("en_core_web_sm")
>>> doc = nlp("sydney is a cool town")
>>> t = doc.merge(0,6)
>>> t
sydney
>>> z = doc.merge(0,11)
>>> z
sydney is a

我嘗試了以下方法，但出現錯誤：

>>> with doc.retokenize() as retokenizer:
...      retokenizer.merge(0, 6)
...

我想在上面的t或z等變量中獲得 output。

Answer 1

在重新標記之前：

print([(idx,tok) for idx,tok in enumerate(samp)])
#this prints
#[(0, sydney), (1, is), (2, a), (3, cool), (4, town)]

您可以嘗試這樣做：

retokenize.merge(doc[*index_of_token_to_start_from*:*index_of_ending_token* + 1])

重新標記的完整代碼，

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp(u"sydney is a cool town")
with doc.retokenize() as retokenizer:
    retokenizer.merge(doc[0:3])
print([(idx,tok) for idx,tok in enumerate(doc)]) #[(0, sydney is a), (1, cool), (2, town)]

同理合並冷卻使用， doc[3:5]

spacy doc.merge 使用 retokenizer

問題描述

1 個解決方案

解決方案1
0 2019-10-10 11:55:52

spacy doc.merge 使用 retokenizer

問題描述

1 個解決方案

解決方案1 0 2019-10-10 11:55:52

解決方案1
0 2019-10-10 11:55:52