如何取消标记替换的 spacy.tokens.token.Token？

Question

I was trying to replace the location name from a string and replace it with any city from the list mentioned below, randomly and the get the new formed string and append it to a file.我试图从字符串中替换位置名称，并随机替换为下面提到的列表中的任何城市，然后获取新形成的字符串并将其附加到文件中。 I tried using spacy for this.我尝试为此使用 spacy。 I can easily detect the cities and replace the token, but I am stuck with appending them to get the new line.我可以轻松检测城市并替换令牌，但我坚持附加它们以获取新行。

from pprint import pprint
import spacy
import random

list = ['Delhi','Mumbai','Bangalore','Agra','Jaipur','Noida','Lucknow','Bombay','Jaipur','Indore','Chandigarh','Guwahati','Ghaziabad','Faridabad',
        'Pune','Chennai','kolkata','Hyderabad','Goa']

nlp = spacy.load('en_core_web_sm')

sentence = '''Can You deliver pizza to London.'''

entities = nlp(sentence)

pprint([(X, X.ent_iob_, X.ent_type_) for X in entities])
newstr=""
for X in entities:
    newstr += X
    if  X.ent_type_=='GPE' and X.ent_iob_=='B':
        X = random.choice(list)
        print(X)
        #print(type(X))
    elif X.ent_type_=='GPE' and X.ent_iob_=='I':
        X= ' '



pprint(newstr)

i am getting the following error:我收到以下错误：

 Traceback (most recent call last):
  File "C:\Users\shahi\PycharmProjects\pythonscrappingproject\main.py", line 17, in <module>
    newstr += X
TypeError: can only concatenate str (not "spacy.tokens.token.Token") to str

When i try to run this with commenting out - newstr += X ;当我尝试通过注释运行它时 - newstr += X ; it runs okay.它运行正常。

Answer 1

First, do not use the built-in list as a variable name, use l , for example:首先，不要使用内置list作为变量名，使用l ，例如：

l = ['Delhi','Mumbai','Bangalore','Agra','Jaipur','Noida','Lucknow','Bombay','Jaipur','Indore','Chandigarh','Guwahati','Ghaziabad','Faridabad',
        'Pune','Chennai','kolkata','Hyderabad','Goa']

Then, you can use然后，您可以使用

for X in entities:
    if  X.ent_type_=='GPE' and X.ent_iob_=='B':
        newstr += random.choice(l) + X.whitespace_
    else:
        newstr += X.text + X.whitespace_

where X.text is the actual token text and X.whitespace_ is the whitespace after that token in the original char sequence.其中X.text是实际的标记文本， X.whitespace_是原始字符序列中该标记之后的空格。

Answer 2

尝试通过编写newstr += str(X)将spacy.tokens.token.Token类型转换为str 。

如何取消标记替换的 spacy.tokens.token.Token？

问题描述

2 个解决方案

解决方案1
2 2021-06-29 13:33:41

解决方案2
-1 2021-06-29 13:29:35

如何取消标记替换的 spacy.tokens.token.Token？

问题描述

2 个解决方案

解决方案1 2 2021-06-29 13:33:41

解决方案2 -1 2021-06-29 13:29:35

解决方案1
2 2021-06-29 13:33:41

解决方案2
-1 2021-06-29 13:29:35