I am trying to condense a large file and I need to eliminate the lines not containing a certain pattern. However, I need also to save to a new file a certain limit of lines after the "not-pattern" line, and to continue to read every line of the file up to find a new "not-pattern" line.
For example, to recover the first 2 records after each "non-pattern line", the input file looks like this:
146587678080980
1789dsdss809809 ABC1
1898fdfdf908908 ABC2
1789798709fdb80 ABC3
798789789767567 ABC4
798787576567577
178990809809809 ABC7
189890sf908908f ABC8
178979ggggf9080 ABC9
18098rrttty0980 ABC10
1mkklnklnlknlkn ABC17
The output file should be:
1789dsdss809809 ABC1
1898fdfdf908908 ABC2
178990809809809 ABC7
189890sf908908f ABC8
I have tried this code up to now without success:
limit = 2
with open('input.txt') as oldfile, open('output.txt') as newfile:
for line in oldfile:
if not ('ABC'):
line_count = 0
if line_count <= limit:
newfile.write(line)
line_count += 1
Here's a way that is similar to your example:
limit = 2
with open('input.txt') as ifh, open('output.txt', 'w') as ofh:
ctr = 0
for line in ifh:
if not 'ABC' in line:
ctr = 0
else:
if ctr < limit:
ctr += 1
ofh.write(line)
And here's an approach that is logically more explicit:
limit = 2
with open('input.txt') as ifh, open('output.txt', 'w') as ofh:
it = iter(ifh)
while True:
try:
if not 'ABC' in next(it):
for _ in range(limit):
ofh.write(next(it))
except StopIteration:
break
You need to track 2 states:
limit = 2
with open('input.txt', "r") as oldfile, open('output.txt', "w") as newfile:
is_capturing = False
for line in oldfile:
if not line.strip():
# Ignore empty lines, do not consider them as a non-pattern
continue
elif not 'ABC' in line and not is_capturing:
# State 1
# Found the start of the non-pattern line ('ABC' not in line)
# Enable state to capture next lines
is_capturing = True
line_count = 0
elif is_capturing and line_count < limit:
# State 2
# Capture a certain limit of lines after the non-pattern line
newfile.write(line)
line_count += 1
else:
# Reset the state
is_capturing = False
The output file should contain:
1789dsdss809809 ABC1
1898fdfdf908908 ABC2
178990809809809 ABC7
189890sf908908f ABC8
If you need to also save the "non-pattern" line, add it to State 1:
elif not 'ABC' in line and not is_capturing:
# State 1
# Found the start of the non-pattern line ('ABC' not in line)
# Enable state to capture next lines
newfile.write(line)
is_capturing = True
line_count = 0
If you want to preserve the empty lines between each written line:
newfile.write(line + '\n')
limit = 2
with open('input.txt') as oldfile, open('output.txt', 'w') as newfile:
line_count = 0
for line in oldfile:
if 'ABC' in line:
newfile.write(line)
line_count += 1
if line_count == limit:
break
Given the input file as this:
146587678080980
1789dsdss809809 ABC1
1898fdfdf908908 ABC2
1789798709fdb80 ABC3
798789789767567 ABC4
798787576567577
178990809809809 ABC7
189890sf908908f ABC8
178979ggggf9080 ABC9
18098rrttty0980 ABC10
1mkklnklnlknlkn ABC17
First open the file and strip the empty lines, saving the lines with content to a list of lines:
with open('input.txt', 'r') as f:
in_lines = [line.strip('\n') for line in f.readlines() if len(line.strip('\n')) > 0]
Then you run through all the lines to find the "non-pattern line" ids and extend an empty output list of lines with the lines up to the limit after the current "non-pattern line" index.
out_lines = list()
LIMIT = 2
for idx, line in enumerate(in_lines):
if 'ABC' not in line:
out_lines.extend(in_lines[(idx + 1):(idx + 1 + LIMIT)])
To get the output file with the same format as the input:
with open('output.txt', 'w') as f:
f.writelines('\n\n'.join(out_lines))
The result output.txt
should be this:
1789dsdss809809 ABC1
1898fdfdf908908 ABC2
178990809809809 ABC7
189890sf908908f ABC8
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.