如何取消对spacy.tokens.token.Token的标记？

Question

how can I untokenize the output of this code? 如何取消此代码的输出令牌？

class Core: 类核心：

def __init__(self, user_input):
    pos = pop(user_input)
    subject = ""
    for token in pos:
        if token.dep == nsubj:
            subject = untokenize.untokenize(token)
    subject = S(subject)

I tried: https://pypi.org/project/untokenize/ 我试过了： https : //pypi.org/project/untokenize/

MosesDetokenizer MosesDetokenizer

.join() 。加入（）

But I have this error for my last code (from this post): 但是我的最后一个代码有这个错误（来自这篇文章）：

TypeError: 'spacy.tokens.token.Token' object is not iterable

This error for .join(): .join（）的此错误：

AttributeError: 'spacy.tokens.token.Token' object has no attribute 'join'

And for MosesDetokenizer: text = u" {} ".format(" ".join(tokens)) TypeError: can only join an iterable 对于MosesDetokenizer：text = u“ {}” .format（“” .join（tokens））TypeError：只能加入可迭代对象

Answer 1

All tokens in spacy keep their context around so all text can be recreated without any loss of data. 所有保留的令牌都保持其上下文不变，因此可以重新创建所有文本而不会丢失任何数据。

In your case, all you have to do is: 就您而言，您要做的就是：

''.join([token.text_with_ws for token in doc])

Since the attribute text_with_ws has the token with its corresponding whitespace character if it exists. 由于属性text_with_ws具有标记及其相应的空白字符（如果存在）。

Answer 2

SpaCy tokens have their doc object associated with them, so this will give you the original sentence as a string: SpaCy令牌具有与之关联的doc对象，因此这将为您提供原始句子作为字符串：

import spacy
nlp = spacy.load('en')
doc = nlp("I like cake.")
token = doc[0]

print(token.doc) # prints "I like cake."

Answer 3

@RazvanP I believe what you need here is to get the following output @RazvanP我相信您需要在这里获得以下输出

['I', 'like', 'cake', '.']

If yes, here is the code. 如果是，这是代码。

new_list=[]

doc = nlp("I like cake.")

for i in doc:

    new_list.append(''+i.text+'')

print(new_list)

See if this works for you. 看看这是否适合您。

如何取消对spacy.tokens.token.Token的标记？

问题描述

3 个解决方案

解决方案1
0 2019-04-04 17:15:31

解决方案2
0 2019-04-08 03:10:46

解决方案3
0 2019-08-02 14:23:43

如何取消对spacy.tokens.token.Token的标记？

问题描述

3 个解决方案

解决方案1 0 2019-04-04 17:15:31

解决方案2 0 2019-04-08 03:10:46

解决方案3 0 2019-08-02 14:23:43

解决方案1
0 2019-04-04 17:15:31

解决方案2
0 2019-04-08 03:10:46

解决方案3
0 2019-08-02 14:23:43