I am trying to print the text after a specific string.
file.txt
I am: "eating", mango
I am: eating a pine apple; and mango
I am trying to write a code where it should search for a keyword am: and print the text in "". if there are no "" in a line after am: then I want to print till;(or simply say 3 words).
output.txt
I am: eating
I am: eating a pine apple
My work:
with open('input.txt', 'r') as f, open ("output.txt", 'w') as out_fh:
for line in f:
str = re.search(r'\bam: "([^"]+)"', line).group()[0]
if str:
out_fh.write(str)
else:
a = re.compile(r'am:((\w+){3}')
out_fh.write(a)
Not sure where I am going wrong. Any help would be appreciated. Thank you
You may use a single regex to fetch the expected result:
rx = re.compile(r'^(I am:\s*)("[^"]*"|[^;]*)')
See the regex demo . The regex matches
^
- start of a string (I am:
- start of Capturing group 1: I am:
string \s*)
- 0+ whitespaces, end of capturing group 1 ("[^"]*"|[^;]*)
- Capturing group 1: a "
followed with any 0 or more chars other than "
and then a "
, or any 0+ chars other than ;
In you code, use it like this:
rx = re.compile(r'\bam:\s*("[^"]*"|[^;]*)')
with open('input.txt', 'r') as f, open ("output.txt", 'w') as out_fh:
for line in f:
m = rx.search(line)
if m:
out_fh.write( "{}{}".format(m.group(1), m.group(2).strip('"')) )
Note that .strip('"')
will remove the leading and trailing "
chars captured with the first alternative in Group 1.
See a Python demo :
import re
text = """I am: "eating", mango
I am: eating a pine apple; and mango"""
rx = re.compile(r'^(I am:\s*)("[^"]*"|[^;]*)')
for line in text.splitlines():
m = rx.search(line)
if m:
print("{}{}".format(m.group(1), m.group(2).strip('"')))
Output:
I am: eating
I am: eating a pine apple
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.