[英]Print x lines after a line NOT containing a string
I am trying to condense a large file and I need to eliminate the lines not containing a certain pattern. 我正在尝试压缩一个大文件,并且需要消除不包含特定模式的行。 However, I need also to save to a new file a certain limit of lines after the "not-pattern" line, and to continue to read every line of the file up to find a new "not-pattern" line.
但是,我还需要将“非模式”行之后的行数限制保存到新文件中,并继续读取文件的每一行以找到新的“非模式”行。
For example, to recover the first 2 records after each "non-pattern line", the input file looks like this: 例如,要恢复每个“非模式行”之后的前2条记录,输入文件如下所示:
146587678080980
1789dsdss809809 ABC1
1898fdfdf908908 ABC2
1789798709fdb80 ABC3
798789789767567 ABC4
798787576567577
178990809809809 ABC7
189890sf908908f ABC8
178979ggggf9080 ABC9
18098rrttty0980 ABC10
1mkklnklnlknlkn ABC17
The output file should be: 输出文件应为:
1789dsdss809809 ABC1
1898fdfdf908908 ABC2
178990809809809 ABC7
189890sf908908f ABC8
I have tried this code up to now without success: 到目前为止,我已经尝试了以下代码:
limit = 2
with open('input.txt') as oldfile, open('output.txt') as newfile:
for line in oldfile:
if not ('ABC'):
line_count = 0
if line_count <= limit:
newfile.write(line)
line_count += 1
Here's a way that is similar to your example: 这是一种类似于您的示例的方法:
limit = 2
with open('input.txt') as ifh, open('output.txt', 'w') as ofh:
ctr = 0
for line in ifh:
if not 'ABC' in line:
ctr = 0
else:
if ctr < limit:
ctr += 1
ofh.write(line)
And here's an approach that is logically more explicit: 这是一种在逻辑上更加明确的方法:
limit = 2
with open('input.txt') as ifh, open('output.txt', 'w') as ofh:
it = iter(ifh)
while True:
try:
if not 'ABC' in next(it):
for _ in range(limit):
ofh.write(next(it))
except StopIteration:
break
You need to track 2 states: 您需要跟踪2个状态:
limit = 2
with open('input.txt', "r") as oldfile, open('output.txt', "w") as newfile:
is_capturing = False
for line in oldfile:
if not line.strip():
# Ignore empty lines, do not consider them as a non-pattern
continue
elif not 'ABC' in line and not is_capturing:
# State 1
# Found the start of the non-pattern line ('ABC' not in line)
# Enable state to capture next lines
is_capturing = True
line_count = 0
elif is_capturing and line_count < limit:
# State 2
# Capture a certain limit of lines after the non-pattern line
newfile.write(line)
line_count += 1
else:
# Reset the state
is_capturing = False
The output file should contain: 输出文件应包含:
1789dsdss809809 ABC1
1898fdfdf908908 ABC2
178990809809809 ABC7
189890sf908908f ABC8
If you need to also save the "non-pattern" line, add it to State 1: 如果您还需要保存“非模式”行,请将其添加到状态1:
elif not 'ABC' in line and not is_capturing:
# State 1
# Found the start of the non-pattern line ('ABC' not in line)
# Enable state to capture next lines
newfile.write(line)
is_capturing = True
line_count = 0
If you want to preserve the empty lines between each written line: 如果要保留每行之间的空行:
newfile.write(line + '\n')
limit = 2
with open('input.txt') as oldfile, open('output.txt', 'w') as newfile:
line_count = 0
for line in oldfile:
if 'ABC' in line:
newfile.write(line)
line_count += 1
if line_count == limit:
break
Given the input file as this: 给定输入文件如下:
146587678080980
1789dsdss809809 ABC1
1898fdfdf908908 ABC2
1789798709fdb80 ABC3
798789789767567 ABC4
798787576567577
178990809809809 ABC7
189890sf908908f ABC8
178979ggggf9080 ABC9
18098rrttty0980 ABC10
1mkklnklnlknlkn ABC17
First open the file and strip the empty lines, saving the lines with content to a list of lines: 首先打开文件并删除空行,将包含内容的行保存到行列表中:
with open('input.txt', 'r') as f:
in_lines = [line.strip('\n') for line in f.readlines() if len(line.strip('\n')) > 0]
Then you run through all the lines to find the "non-pattern line" ids and extend an empty output list of lines with the lines up to the limit after the current "non-pattern line" index. 然后,您遍历所有行以查找“非图案行” id,并扩展空的行输出列表,其中行数达到当前“非图案行”索引之后的限制。
out_lines = list()
LIMIT = 2
for idx, line in enumerate(in_lines):
if 'ABC' not in line:
out_lines.extend(in_lines[(idx + 1):(idx + 1 + LIMIT)])
To get the output file with the same format as the input: 要获得与输入格式相同的输出文件:
with open('output.txt', 'w') as f:
f.writelines('\n\n'.join(out_lines))
The result output.txt
should be this: 结果
output.txt
应该是这样的:
1789dsdss809809 ABC1
1898fdfdf908908 ABC2
178990809809809 ABC7
189890sf908908f ABC8
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.