[英]How to replace strings in sentence with appropriate dictionary values?
我有一本字典如下:
dict_ = {
'the school in USA' : 'some_text_1',
'school' : 'some_text_1',
'the holy church in brisbane' : 'some_text_2',
'holy church' : 'some_text_2'
}
和一个句子列表如下:
text_sent = ["Ram is going to the holy church in brisbane",\
"John is going to holy church", \
"shena is going to the school in USA", \
"Jennifer is going to the school"]
我想用 text_sent 中的相应值替换出现的 dict_ 字典键。 我这样做如下:
for ind, text in enumerate(text_sent) :
for iterator in dict_.keys() :
if iterator in text :
text_sent[ind] = re.sub(iterator, dict_[iterator], text)
for i in text_sent:
print(i)
我得到的Output如下:
Ram is going to the some_text_2 in brisbane
John is going to some_text_2
shena is going to the some_text_1 in USA
Jennifer is going to the some_text_1
预期的 output 是:
Ram is going to some_text_2
John is going to some_text_2
shena is going to some_text_1
Jennifer is going to some_text_1
我需要的是,较长的字符串(例如,“布里斯班的圣堂”)需要更换,如果句子中没有完整的字符串,只有较小的版本(例如,在text_sent的句子中替换相应的值时,应该使用 ' Holy Church ') 而不是较长的那个。
您可以使用re.sub
进行替换,使用str.join
格式化 ZE83AED3DDF4667DEC0DAAAACB2BB3BE0BZ 字典中的正则表达式:
import re
d = {'the school in USA': 'some_text_1', 'school': 'some_text_1', 'the holy church in brisbane': 'some_text_2', 'holy church': 'some_text_2'}
text_sent = ["Ram is going to the holy church in brisbane",\
"John is going to holy church", \
"shena is going to the School in USA", \
"Jennifer is going to the school"]
r = [re.sub('|'.join(d), lambda x:d[x.group()], i, re.I) for i in text_sent]
Output:
['Ram is going to some_text_2', 'John is going to some_text_2', 'shena is going to some_text_1', 'Jennifer is going to the some_text_1']
您可以为 dict 创建一个辅助列表并根据其元素长度对其进行排序。
dict_ = {'the school in USA' : 'some_text_1',
'school' : 'some_text_1',
'the holy church in brisbane' : 'some_text_2',
'holy church' : 'some_text_2'}
text_sent = ["Ram is going to the holy church in brisbane",
"John is going to holy church",
"shena is going to the school in USA",
"Jennifer is going to the school"]
dict_keys = list(dict_.keys())
dict_keys.sort(key=len)
dict_keys.reverse()
text_sent_replaced = []
for text in text_sent:
modified_text = text
for key in dict_:
modified_text = modified_text.replace(key,dict_[key])
text_sent_replaced.append(modified_text)
print(text_sent_replaced)
主要问题是您没有添加break
语句。 如果稍后在dict_
字典中有多个匹配项,则您将覆盖值。 尝试这个:
for ind, text in enumerate(text_sent) :
for iterator in dict_.keys() :
if iterator in text :
text_sent[ind] = re.sub(iterator, dict_[iterator], text)
break
只要替换的元素位于每行的末尾,这将在不使用 re 的情况下完成任务,就像您的示例中的情况一样:
for ind, text in enumerate(text_sent) :
for iterator in dict_.keys() :
if iterator in text :
text_sent[ind] = text.split(iterator)[0] + dict_[iterator]
for i in text_sent:
print(i)
#Prints:
#Ram is going to the some_text_2
#John is going to some_text_2
#shena is going to the some_text_1
#Jennifer is going to the some_text_1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.