I have a dictionary as follows:
dict_ = {
'the school in USA' : 'some_text_1',
'school' : 'some_text_1',
'the holy church in brisbane' : 'some_text_2',
'holy church' : 'some_text_2'
}
and a list of sentences as follows:
text_sent = ["Ram is going to the holy church in brisbane",\
"John is going to holy church", \
"shena is going to the school in USA", \
"Jennifer is going to the school"]
I want to replace the occurrences of keys of dict_ dictionary with corresponding values in text_sent. I did this as follows:
for ind, text in enumerate(text_sent) :
for iterator in dict_.keys() :
if iterator in text :
text_sent[ind] = re.sub(iterator, dict_[iterator], text)
for i in text_sent:
print(i)
Output I got is as follows:
Ram is going to the some_text_2 in brisbane
John is going to some_text_2
shena is going to the some_text_1 in USA
Jennifer is going to the some_text_1
Expected output is:
Ram is going to some_text_2
John is going to some_text_2
shena is going to some_text_1
Jennifer is going to some_text_1
What I need is, the strings that are longer (for example, " the holy church in brisbane ") need to be replaced, if in case, the complete string is not available in the sentence, only then the smaller version(for example, ' holy church ') should be used instead of the longer one for replacing corresponding value in text_sent's sentences.
You can use re.sub
to make the replacements, using str.join
to format the regex expression from the substring dictionary:
import re
d = {'the school in USA': 'some_text_1', 'school': 'some_text_1', 'the holy church in brisbane': 'some_text_2', 'holy church': 'some_text_2'}
text_sent = ["Ram is going to the holy church in brisbane",\
"John is going to holy church", \
"shena is going to the School in USA", \
"Jennifer is going to the school"]
r = [re.sub('|'.join(d), lambda x:d[x.group()], i, re.I) for i in text_sent]
Output:
['Ram is going to some_text_2', 'John is going to some_text_2', 'shena is going to some_text_1', 'Jennifer is going to the some_text_1']
You can create an auxiliary list for the dict and sort it dependending on it's elements length.
dict_ = {'the school in USA' : 'some_text_1',
'school' : 'some_text_1',
'the holy church in brisbane' : 'some_text_2',
'holy church' : 'some_text_2'}
text_sent = ["Ram is going to the holy church in brisbane",
"John is going to holy church",
"shena is going to the school in USA",
"Jennifer is going to the school"]
dict_keys = list(dict_.keys())
dict_keys.sort(key=len)
dict_keys.reverse()
text_sent_replaced = []
for text in text_sent:
modified_text = text
for key in dict_:
modified_text = modified_text.replace(key,dict_[key])
text_sent_replaced.append(modified_text)
print(text_sent_replaced)
The main issue is that you didn't add a break
statement. You are overriding values if there are multiple matches later on in the dict_
dictionary. Try this:
for ind, text in enumerate(text_sent) :
for iterator in dict_.keys() :
if iterator in text :
text_sent[ind] = re.sub(iterator, dict_[iterator], text)
break
This will accomplish the task without using re as long as the substituted elemets are at the end of each line, as was the case in your example:
for ind, text in enumerate(text_sent) :
for iterator in dict_.keys() :
if iterator in text :
text_sent[ind] = text.split(iterator)[0] + dict_[iterator]
for i in text_sent:
print(i)
#Prints:
#Ram is going to the some_text_2
#John is going to some_text_2
#shena is going to the some_text_1
#Jennifer is going to the some_text_1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.