[英]Print x lines after a line NOT containing a string
我正在嘗試壓縮一個大文件,並且需要消除不包含特定模式的行。 但是,我還需要將“非模式”行之后的行數限制保存到新文件中,並繼續讀取文件的每一行以找到新的“非模式”行。
例如,要恢復每個“非模式行”之后的前2條記錄,輸入文件如下所示:
146587678080980
1789dsdss809809 ABC1
1898fdfdf908908 ABC2
1789798709fdb80 ABC3
798789789767567 ABC4
798787576567577
178990809809809 ABC7
189890sf908908f ABC8
178979ggggf9080 ABC9
18098rrttty0980 ABC10
1mkklnklnlknlkn ABC17
輸出文件應為:
1789dsdss809809 ABC1
1898fdfdf908908 ABC2
178990809809809 ABC7
189890sf908908f ABC8
到目前為止,我已經嘗試了以下代碼:
limit = 2
with open('input.txt') as oldfile, open('output.txt') as newfile:
for line in oldfile:
if not ('ABC'):
line_count = 0
if line_count <= limit:
newfile.write(line)
line_count += 1
這是一種類似於您的示例的方法:
limit = 2
with open('input.txt') as ifh, open('output.txt', 'w') as ofh:
ctr = 0
for line in ifh:
if not 'ABC' in line:
ctr = 0
else:
if ctr < limit:
ctr += 1
ofh.write(line)
這是一種在邏輯上更加明確的方法:
limit = 2
with open('input.txt') as ifh, open('output.txt', 'w') as ofh:
it = iter(ifh)
while True:
try:
if not 'ABC' in next(it):
for _ in range(limit):
ofh.write(next(it))
except StopIteration:
break
您需要跟蹤2個狀態:
limit = 2
with open('input.txt', "r") as oldfile, open('output.txt', "w") as newfile:
is_capturing = False
for line in oldfile:
if not line.strip():
# Ignore empty lines, do not consider them as a non-pattern
continue
elif not 'ABC' in line and not is_capturing:
# State 1
# Found the start of the non-pattern line ('ABC' not in line)
# Enable state to capture next lines
is_capturing = True
line_count = 0
elif is_capturing and line_count < limit:
# State 2
# Capture a certain limit of lines after the non-pattern line
newfile.write(line)
line_count += 1
else:
# Reset the state
is_capturing = False
輸出文件應包含:
1789dsdss809809 ABC1
1898fdfdf908908 ABC2
178990809809809 ABC7
189890sf908908f ABC8
如果您還需要保存“非模式”行,請將其添加到狀態1:
elif not 'ABC' in line and not is_capturing:
# State 1
# Found the start of the non-pattern line ('ABC' not in line)
# Enable state to capture next lines
newfile.write(line)
is_capturing = True
line_count = 0
如果要保留每行之間的空行:
newfile.write(line + '\n')
limit = 2
with open('input.txt') as oldfile, open('output.txt', 'w') as newfile:
line_count = 0
for line in oldfile:
if 'ABC' in line:
newfile.write(line)
line_count += 1
if line_count == limit:
break
給定輸入文件如下:
146587678080980
1789dsdss809809 ABC1
1898fdfdf908908 ABC2
1789798709fdb80 ABC3
798789789767567 ABC4
798787576567577
178990809809809 ABC7
189890sf908908f ABC8
178979ggggf9080 ABC9
18098rrttty0980 ABC10
1mkklnklnlknlkn ABC17
首先打開文件並刪除空行,將包含內容的行保存到行列表中:
with open('input.txt', 'r') as f:
in_lines = [line.strip('\n') for line in f.readlines() if len(line.strip('\n')) > 0]
然后,您遍歷所有行以查找“非圖案行” id,並擴展空的行輸出列表,其中行數達到當前“非圖案行”索引之后的限制。
out_lines = list()
LIMIT = 2
for idx, line in enumerate(in_lines):
if 'ABC' not in line:
out_lines.extend(in_lines[(idx + 1):(idx + 1 + LIMIT)])
要獲得與輸入格式相同的輸出文件:
with open('output.txt', 'w') as f:
f.writelines('\n\n'.join(out_lines))
結果output.txt
應該是這樣的:
1789dsdss809809 ABC1
1898fdfdf908908 ABC2
178990809809809 ABC7
189890sf908908f ABC8
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.