简体   繁体   English

如何用适当的字典值替换句子中的字符串?

[英]How to replace strings in sentence with appropriate dictionary values?

I have a dictionary as follows:我有一本字典如下:

dict_ = { 
        'the school in USA' : 'some_text_1',
        'school' : 'some_text_1',
        'the holy church in brisbane' : 'some_text_2',
        'holy church' : 'some_text_2'
}

and a list of sentences as follows:和一个句子列表如下:

text_sent = ["Ram is going to the holy church in brisbane",\
             "John is going to holy church", \
             "shena is going to the school in USA", \
             "Jennifer is going to the school"]

I want to replace the occurrences of keys of dict_ dictionary with corresponding values in text_sent.我想用 text_sent 中的相应值替换出现的 dict_ 字典键。 I did this as follows:我这样做如下:

for ind, text in enumerate(text_sent) :
    for iterator in dict_.keys() :
        if iterator in text : 
            text_sent[ind] = re.sub(iterator, dict_[iterator], text)

for i in text_sent:
    print(i)

Output I got is as follows:我得到的Output如下:

Ram is going to the some_text_2 in brisbane
John is going to some_text_2
shena is going to the some_text_1 in USA
Jennifer is going to the some_text_1

Expected output is:预期的 output 是:

Ram is going to some_text_2
John is going to some_text_2
shena is going to some_text_1
Jennifer is going to some_text_1

What I need is, the strings that are longer (for example, " the holy church in brisbane ") need to be replaced, if in case, the complete string is not available in the sentence, only then the smaller version(for example, ' holy church ') should be used instead of the longer one for replacing corresponding value in text_sent's sentences.我需要的是,较长的字符串(例如,“布里斯班的圣堂”)需要更换,如果句子中没有完整的字符串,只有较小的版本(例如,在text_sent的句子中替换相应的值时,应该使用 ' Holy Church ') 而不是较长的那个。

You can use re.sub to make the replacements, using str.join to format the regex expression from the substring dictionary:您可以使用re.sub进行替换,使用str.join格式化 ZE83AED3DDF4667DEC0DAAAACB2BB3BE0BZ 字典中的正则表达式:

import re
d = {'the school in USA': 'some_text_1', 'school': 'some_text_1', 'the holy church in brisbane': 'some_text_2', 'holy church': 'some_text_2'}
text_sent = ["Ram is going to the holy church in brisbane",\
         "John is going to holy church", \
         "shena is going to the School in USA", \
         "Jennifer is going to the school"]

r = [re.sub('|'.join(d), lambda x:d[x.group()], i, re.I) for i in text_sent]

Output: Output:

['Ram is going to some_text_2', 'John is going to some_text_2', 'shena is going to some_text_1', 'Jennifer is going to the some_text_1']

You can create an auxiliary list for the dict and sort it dependending on it's elements length.您可以为 dict 创建一个辅助列表并根据其元素长度对其进行排序。

dict_ = {'the school in USA' : 'some_text_1',
         'school' : 'some_text_1',
         'the holy church in brisbane' : 'some_text_2',
         'holy church' : 'some_text_2'}

text_sent = ["Ram is going to the holy church in brisbane",
             "John is going to holy church",
             "shena is going to the school in USA",
             "Jennifer is going to the school"]

dict_keys = list(dict_.keys())
dict_keys.sort(key=len)
dict_keys.reverse()

text_sent_replaced = []
for text in text_sent:
    modified_text = text
    for key in dict_:
        modified_text = modified_text.replace(key,dict_[key])
    text_sent_replaced.append(modified_text)

print(text_sent_replaced)

The main issue is that you didn't add a break statement.主要问题是您没有添加break语句。 You are overriding values if there are multiple matches later on in the dict_ dictionary.如果稍后在dict_字典中有多个匹配项,则您将覆盖值。 Try this:尝试这个:

for ind, text in enumerate(text_sent) :
    for iterator in dict_.keys() :
        if iterator in text :
            text_sent[ind] = re.sub(iterator, dict_[iterator], text)
            break

This will accomplish the task without using re as long as the substituted elemets are at the end of each line, as was the case in your example:只要替换的元素位于每行的末尾,这将在不使用 re 的情况下完成任务,就像您的示例中的情况一样:

for ind, text in enumerate(text_sent) :
    for iterator in dict_.keys() :
        if iterator in text :
            text_sent[ind] = text.split(iterator)[0] + dict_[iterator]

for i in text_sent:
    print(i)

#Prints:
#Ram is going to the some_text_2
#John is going to some_text_2
#shena is going to the some_text_1
#Jennifer is going to the some_text_1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在不使用eval的情况下替换字典值中的字符串? - How to replace strings within dictionary values without using eval? 如何基于值为列表的字典替换pandas系列中的字符串组? - How to replace group of strings in pandas series based on a dictionary with values as list? 将句子中的 N 位数字替换为 N 的不同值的特定字符串 - Replace N digit numbers in a sentence with specific strings for different values of N 使用字典替换字符串列表中的值 - Using a Dictionary to Replace Values Over a List of Strings 如何用另一个字典值替换字典的值 - how to replace values of dictionary with another dictionary values 如何替换有序字典中的值 - How to replace values in an ordered dictionary 给字典中的句子赋值 - Give values to a sentence from a dictionary 如何从字典中提取值并将它们格式化为句子 - How to extract values from dictionary and format them into a sentence 将字典中的元组值(字符串)替换为对应于每个字符串的数字? - Replace tuple values (strings) in dictionary with a number that corresponds to each string? 列表理解用字典中的键字符串替换字符串中的值 - List comprehension to replace values in a string with key strings from a dictionary
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM