[英]How can I untokenize a spacy.tokens.token.Token?
how can I untokenize the output of this code? 如何取消此代码的输出令牌?
class Core: 类核心:
def __init__(self, user_input):
pos = pop(user_input)
subject = ""
for token in pos:
if token.dep == nsubj:
subject = untokenize.untokenize(token)
subject = S(subject)
I tried: https://pypi.org/project/untokenize/ 我试过了: https : //pypi.org/project/untokenize/
MosesDetokenizer MosesDetokenizer
.join() 。加入()
But I have this error for my last code (from this post): 但是我的最后一个代码有这个错误(来自这篇文章):
TypeError: 'spacy.tokens.token.Token' object is not iterable
This error for .join(): .join()的此错误:
AttributeError: 'spacy.tokens.token.Token' object has no attribute 'join'
And for MosesDetokenizer: text = u" {} ".format(" ".join(tokens)) TypeError: can only join an iterable 对于MosesDetokenizer:text = u“ {}” .format(“” .join(tokens))TypeError:只能加入可迭代对象
All tokens in spacy keep their context around so all text can be recreated without any loss of data. 所有保留的令牌都保持其上下文不变,因此可以重新创建所有文本而不会丢失任何数据。
In your case, all you have to do is: 就您而言,您要做的就是:
''.join([token.text_with_ws for token in doc])
Since the attribute text_with_ws
has the token with its corresponding whitespace character if it exists. 由于属性
text_with_ws
具有标记及其相应的空白字符(如果存在)。
SpaCy tokens have their doc object associated with them, so this will give you the original sentence as a string: SpaCy令牌具有与之关联的doc对象,因此这将为您提供原始句子作为字符串:
import spacy
nlp = spacy.load('en')
doc = nlp("I like cake.")
token = doc[0]
print(token.doc) # prints "I like cake."
@RazvanP I believe what you need here is to get the following output @RazvanP我相信您需要在这里获得以下输出
['I', 'like', 'cake', '.']
If yes, here is the code. 如果是,这是代码。
new_list=[]
doc = nlp("I like cake.")
for i in doc:
new_list.append(''+i.text+'')
print(new_list)
See if this works for you. 看看这是否适合您。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.