I want to print the contents of a file to the terminal and in the process highlight any words that are found in a list without modifying the original file. Here's an example of the not-yet-working code:
def highlight_story(self):
"""Print a line from a file and highlight words in a list."""
the_file = open(self.filename, 'r')
file_contents = the_file.read()
for word in highlight_terms:
regex = re.compile(
r'\b' # Word boundary.
+ word # Each item in the list.
+ r's{0,1}', # One optional 's' at the end.
flags=re.IGNORECASE | re.VERBOSE)
subst = '\033[1;41m' + r'\g<0>' + '\033[0m'
result = re.sub(regex, subst, file_contents)
print result
the_file.close()
highlight_terms = [
'dog',
'hedgehog',
'grue'
]
As it is, only the last item in the list, regardless of what it is or how long the list is, will be highlighted. I assume that each substitution is performed and then "forgotten" when the next iteration begins. It looks something like this:
Grues have been known to eat both human and non-human animals. In poorly-lit areas dogs and hedgehogs are considered by any affluent grue to a be delicacies. Dogs can frighten awat a grue , however, by barking in a musical scale. A hedgehog, on the other hand, must simply resign itself to its fate of becoming a hotdog fit for a grue king.
But it should look like this:
Grues have been known to eat both human and non-human animals. In poorly-lit areas dogs and hedgehogs are considered by any affluent grue to a be delicacies. Dogs can frighten away a grue , however, by barking in a musical scale. A hedgehog , on the other hand, must simply resign itself to its fate of becoming a hotdog fit for a grue king.
How can I stop the other substitutions from being lost?
You can modify your regex to the following:
regex = re.compile(r'\b('+'|'.join(highlight_terms)+r')s?', flags=re.IGNORECASE | re.VERBOSE) # note the ? instead of {0, 1}. It has the same effect
Then, you won't need the for
loop.
This code takes the list of words and then concatenates them together with a |
. So if your list was something like:
a = ['cat', 'dog', 'mouse'];
The regex would be:
\b(cat|dog|mouse)s?
The regex provided is correct, but the for loop is where you got wrong.
result = re.sub(regex, subst, file_contents)
This line substitutes the regex
with subst
in the file_content
.
in the second iteration, it again does the substitution in file_content
where as you intented to do it on result
How to correct
result = file_contents
for word in highlight_terms:
regex = re.compile(
r'\b' # Word boundary.
+ word # Each item in the list.
+ r's?\b', # One optional 's' at the end.
flags=re.IGNORECASE | re.VERBOSE)
print regex.pattern
subst = '\033[1;41m' + r'\g<0>' + '\033[0m'
result = re.sub(regex, subst, result) #change made here
print result
you need to reassign file_contents
each time through the loop to the replaced string, reassigning file_contents
does not change the content in the file:
def highlight_story(self):
"""Print a line from a file and highlight words in a list."""
the_file = open(self.filename, 'r')
file_contents = the_file.read()
output = ""
for word in highlight_terms:
regex = re.compile(
r'\b' # Word boundary.
+ word # Each item in the list.
+ r's{0,1}', # One optional 's' at the end.
flags=re.IGNORECASE | re.VERBOSE)
subst = '\033[1;41m' + r'\g<0>' + '\033[0m'
file_contents = re.sub(regex, subst, file_contents) # reassign to updatedvalue
print file_contents
the_file.close()
Also using with to open files is a better way to go and you can make a copy of the string outside the loop and update inside:
def highlight_story(self):
"""Print a line from a file and highlight words in a list."""
with open(self.filename) as the_file:
file_contents = the_file.read()
output = file_contents # copy
for word in highlight_terms:
regex = re.compile(
r'\b' # Word boundary.
+ word # Each item in the list.
+ r's{0,1}', # One optional 's' at the end.
flags=re.IGNORECASE | re.VERBOSE)
subst = '\033[1;41m' + r'\g<0>' + '\033[0m'
output = re.sub(regex, subst, output) # update copy
print output
the_file.close()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.