簡體   English   中英

在不包含字符串的行之后打印x行

[英]Print x lines after a line NOT containing a string

我正在嘗試壓縮一個大文件,並且需要消除不包含特定模式的行。 但是,我還需要將“非模式”行之后的行數限制保存到新文件中,並繼續讀取文件的每一行以找到新的“非模式”行。

例如,要恢復每個“非模式行”之后的前2條記錄,輸入文件如下所示:

146587678080980

1789dsdss809809 ABC1

1898fdfdf908908 ABC2

1789798709fdb80 ABC3

798789789767567 ABC4

798787576567577

178990809809809 ABC7

189890sf908908f ABC8

178979ggggf9080 ABC9

18098rrttty0980 ABC10

1mkklnklnlknlkn ABC17

輸出文件應為:

1789dsdss809809 ABC1

1898fdfdf908908 ABC2

178990809809809 ABC7

189890sf908908f ABC8

到目前為止,我已經嘗試了以下代碼:

limit = 2

with open('input.txt') as oldfile, open('output.txt') as newfile: 
    for line in oldfile:
        if not ('ABC'):
            line_count = 0
            if line_count <= limit:
               newfile.write(line)
            line_count += 1

這是一種類似於您的示例的方法:

limit = 2
with open('input.txt') as ifh, open('output.txt', 'w') as ofh:
    ctr = 0
    for line in ifh:
        if not 'ABC' in line:
            ctr = 0
        else:
            if ctr < limit:
                ctr += 1
                ofh.write(line)

這是一種在邏輯上更加明確的方法:

limit = 2
with open('input.txt') as ifh, open('output.txt', 'w') as ofh:
    it = iter(ifh)
    while True:
        try:
            if not 'ABC' in next(it):
                for _ in range(limit):
                    ofh.write(next(it))
        except StopIteration:
            break

您需要跟蹤2個狀態:

  • 查找非圖案線的一種
  • 一個用於捕獲非模式行之后的行(達到特定限制)
limit = 2

with open('input.txt', "r") as oldfile, open('output.txt', "w") as newfile:
  is_capturing = False
  for line in oldfile:
    if not line.strip():
      # Ignore empty lines, do not consider them as a non-pattern
      continue
    elif not 'ABC' in line and not is_capturing:
      # State 1
      # Found the start of the non-pattern line ('ABC' not in line)
      # Enable state to capture next lines
      is_capturing = True
      line_count = 0
    elif is_capturing and line_count < limit:
      # State 2
      # Capture a certain limit of lines after the non-pattern line
      newfile.write(line)
      line_count += 1
    else:
      # Reset the state
      is_capturing = False

輸出文件應包含:

1789dsdss809809 ABC1
1898fdfdf908908 ABC2
178990809809809 ABC7
189890sf908908f ABC8

如果您還需要保存“非模式”行,請將其添加到狀態1:

    elif not 'ABC' in line and not is_capturing:
      # State 1
      # Found the start of the non-pattern line ('ABC' not in line)
      # Enable state to capture next lines
      newfile.write(line)
      is_capturing = True
      line_count = 0

如果要保留每行之間的空行:

newfile.write(line + '\n')
limit = 2

with open('input.txt') as oldfile, open('output.txt', 'w') as newfile:
    line_count = 0
    for line in oldfile:
        if 'ABC' in line:
            newfile.write(line)
            line_count += 1
            if line_count == limit:
                break

給定輸入文件如下:

146587678080980

1789dsdss809809 ABC1

1898fdfdf908908 ABC2

1789798709fdb80 ABC3

798789789767567 ABC4

798787576567577

178990809809809 ABC7

189890sf908908f ABC8

178979ggggf9080 ABC9

18098rrttty0980 ABC10

1mkklnklnlknlkn ABC17

首先打開文件並刪除空行,將包含內容的行保存到行列表中:

with open('input.txt', 'r') as f:
    in_lines = [line.strip('\n') for line in f.readlines() if len(line.strip('\n')) > 0]

然后,您遍歷所有行以查找“非圖案行” id,並擴展空的行輸出列表,其中行數達到當前“非圖案行”索引之后的限制。

out_lines = list()

LIMIT = 2
for idx, line in enumerate(in_lines):
    if 'ABC' not in line:
        out_lines.extend(in_lines[(idx + 1):(idx + 1 + LIMIT)])

要獲得與輸入格式相同的輸出文件:

with open('output.txt', 'w') as f:
    f.writelines('\n\n'.join(out_lines))

結果output.txt應該是這樣的:

1789dsdss809809 ABC1

1898fdfdf908908 ABC2

178990809809809 ABC7

189890sf908908f ABC8

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM