[英]How can I untokenize a replaced spacy.tokens.token.Token?
I was trying to replace the location name from a string and replace it with any city from the list mentioned below, randomly and the get the new formed string and append it to a file.我试图从字符串中替换位置名称,并随机替换为下面提到的列表中的任何城市,然后获取新形成的字符串并将其附加到文件中。 I tried using spacy for this.
我尝试为此使用 spacy。 I can easily detect the cities and replace the token, but I am stuck with appending them to get the new line.
我可以轻松检测城市并替换令牌,但我坚持附加它们以获取新行。
from pprint import pprint
import spacy
import random
list = ['Delhi','Mumbai','Bangalore','Agra','Jaipur','Noida','Lucknow','Bombay','Jaipur','Indore','Chandigarh','Guwahati','Ghaziabad','Faridabad',
'Pune','Chennai','kolkata','Hyderabad','Goa']
nlp = spacy.load('en_core_web_sm')
sentence = '''Can You deliver pizza to London.'''
entities = nlp(sentence)
pprint([(X, X.ent_iob_, X.ent_type_) for X in entities])
newstr=""
for X in entities:
newstr += X
if X.ent_type_=='GPE' and X.ent_iob_=='B':
X = random.choice(list)
print(X)
#print(type(X))
elif X.ent_type_=='GPE' and X.ent_iob_=='I':
X= ' '
pprint(newstr)
i am getting the following error:我收到以下错误:
Traceback (most recent call last):
File "C:\Users\shahi\PycharmProjects\pythonscrappingproject\main.py", line 17, in <module>
newstr += X
TypeError: can only concatenate str (not "spacy.tokens.token.Token") to str
When i try to run this with commenting out - newstr += X ;当我尝试通过注释运行它时 - newstr += X ; it runs okay.
它运行正常。
First, do not use the built-in list
as a variable name, use l
, for example:首先,不要使用内置
list
作为变量名,使用l
,例如:
l = ['Delhi','Mumbai','Bangalore','Agra','Jaipur','Noida','Lucknow','Bombay','Jaipur','Indore','Chandigarh','Guwahati','Ghaziabad','Faridabad',
'Pune','Chennai','kolkata','Hyderabad','Goa']
Then, you can use然后,您可以使用
for X in entities:
if X.ent_type_=='GPE' and X.ent_iob_=='B':
newstr += random.choice(l) + X.whitespace_
else:
newstr += X.text + X.whitespace_
where X.text
is the actual token text and X.whitespace_
is the whitespace after that token in the original char sequence.其中
X.text
是实际的标记文本, X.whitespace_
是原始字符序列中该标记之后的空格。
尝试通过编写newstr += str(X)
将spacy.tokens.token.Token
类型转换为str
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.