I need to find specific strings in a file upto the line AUTO HEADER
. I am not sure how to restrict the regex
to find the matches only upto a specific line. Can someone help me figure that out?
This is my script:
import re
a = open("mod.txt", "r").read()
op = re.findall(r"type=(\w+)", a, re.MULTILINE)
print(op)
This is my input file mod.txt:
bla bla bla
header
module a
(
type=bye
type=junk
name=xyz type=getme
type=new
AUTO HEADER
type=dont_take_it
type=junk
type=new
Output:
['bye', 'junk', 'getme', 'new', 'dont_take_it', 'junk', 'new']
Expected output:
['bye', 'junk', 'getme', 'new']
In regex
, I need to consider AUTO HEADER
but not sure how exactly.
You can iterate each line in the txt file and exit when you find the required key
Ex:
import re
res = []
with open(filename) as infile:
for line in infile:
if "AUTO HEADER" in line:
break
op = re.search(r"type=(\w+)", line)
if op:
res.append(op.group(1))
print(res) # --> ['bye', 'junk', 'getme', 'new']
You can use Positive Lookahead in regex together with re.DOTALL
op = re.findall(r"type=(\w+)(?=.*AUTO HEADER)", a, re.DOTALL)
print(op)
['bye', 'junk', 'getme', 'new']
(?=.*AUTO HEADER)
Positive Lookahead to ensure any matching texts must be followed by the text AUTO HEADER
somewhere after. Effectively exclude those unwanted matches after the text AUTO HEADER
re.DOTALL
to allow the regex engine to look across lines (so that AUTO HEADER
can be looked ahead).
I don't think regex is the best option here, but here's how it could be done anyhow.
You could do something like this:
[\s\S]*(?=AUTO HEADER)
Where \s
will match on any whitespace character (space; tab; line break..) and \S
- which is the opposite - will match anything that is not a whitespace character. The *
will match all occurrences of the character set.
The (?=AUTO HEADER)
is positive lookahead, it basically means match something after the main expression and don't include it in the result:
This may sound stupid but have you considered not supplying the full text to your Regex match but only the text up to your keyword? Like no reason to not just seperate it quickly before, no?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.