在不包含字符串的行之后打印x行

Question

我正在嘗試壓縮一個大文件，並且需要消除不包含特定模式的行。 但是，我還需要將“非模式”行之后的行數限制保存到新文件中，並繼續讀取文件的每一行以找到新的“非模式”行。

例如，要恢復每個“非模式行”之后的前2條記錄，輸入文件如下所示：

146587678080980

1789dsdss809809 ABC1

1898fdfdf908908 ABC2

1789798709fdb80 ABC3

798789789767567 ABC4

798787576567577

178990809809809 ABC7

189890sf908908f ABC8

178979ggggf9080 ABC9

18098rrttty0980 ABC10

1mkklnklnlknlkn ABC17

輸出文件應為：

1789dsdss809809 ABC1

1898fdfdf908908 ABC2

178990809809809 ABC7

189890sf908908f ABC8

到目前為止，我已經嘗試了以下代碼：

limit = 2

with open('input.txt') as oldfile, open('output.txt') as newfile: 
    for line in oldfile:
        if not ('ABC'):
            line_count = 0
            if line_count <= limit:
               newfile.write(line)
            line_count += 1

Answer 1

這是一種類似於您的示例的方法：

limit = 2
with open('input.txt') as ifh, open('output.txt', 'w') as ofh:
    ctr = 0
    for line in ifh:
        if not 'ABC' in line:
            ctr = 0
        else:
            if ctr < limit:
                ctr += 1
                ofh.write(line)

這是一種在邏輯上更加明確的方法：

limit = 2
with open('input.txt') as ifh, open('output.txt', 'w') as ofh:
    it = iter(ifh)
    while True:
        try:
            if not 'ABC' in next(it):
                for _ in range(limit):
                    ofh.write(next(it))
        except StopIteration:
            break

Answer 2

您需要跟蹤2個狀態：

查找非圖案線的一種
一個用於捕獲非模式行之后的行（達到特定限制）

limit = 2

with open('input.txt', "r") as oldfile, open('output.txt', "w") as newfile:
  is_capturing = False
  for line in oldfile:
    if not line.strip():
      # Ignore empty lines, do not consider them as a non-pattern
      continue
    elif not 'ABC' in line and not is_capturing:
      # State 1
      # Found the start of the non-pattern line ('ABC' not in line)
      # Enable state to capture next lines
      is_capturing = True
      line_count = 0
    elif is_capturing and line_count < limit:
      # State 2
      # Capture a certain limit of lines after the non-pattern line
      newfile.write(line)
      line_count += 1
    else:
      # Reset the state
      is_capturing = False

輸出文件應包含：

1789dsdss809809 ABC1
1898fdfdf908908 ABC2
178990809809809 ABC7
189890sf908908f ABC8

如果您還需要保存“非模式”行，請將其添加到狀態1：

    elif not 'ABC' in line and not is_capturing:
      # State 1
      # Found the start of the non-pattern line ('ABC' not in line)
      # Enable state to capture next lines
      newfile.write(line)
      is_capturing = True
      line_count = 0

如果要保留每行之間的空行：

newfile.write(line + '\n')

Answer 3

limit = 2

with open('input.txt') as oldfile, open('output.txt', 'w') as newfile:
    line_count = 0
    for line in oldfile:
        if 'ABC' in line:
            newfile.write(line)
            line_count += 1
            if line_count == limit:
                break

Answer 4

給定輸入文件如下：

146587678080980

1789dsdss809809 ABC1

1898fdfdf908908 ABC2

1789798709fdb80 ABC3

798789789767567 ABC4

798787576567577

178990809809809 ABC7

189890sf908908f ABC8

178979ggggf9080 ABC9

18098rrttty0980 ABC10

1mkklnklnlknlkn ABC17

首先打開文件並刪除空行，將包含內容的行保存到行列表中：

with open('input.txt', 'r') as f:
    in_lines = [line.strip('\n') for line in f.readlines() if len(line.strip('\n')) > 0]

然后，您遍歷所有行以查找“非圖案行” id，並擴展空的行輸出列表，其中行數達到當前“非圖案行”索引之后的限制。

out_lines = list()

LIMIT = 2
for idx, line in enumerate(in_lines):
    if 'ABC' not in line:
        out_lines.extend(in_lines[(idx + 1):(idx + 1 + LIMIT)])

要獲得與輸入格式相同的輸出文件：

with open('output.txt', 'w') as f:
    f.writelines('\n\n'.join(out_lines))

結果output.txt應該是這樣的：

1789dsdss809809 ABC1

1898fdfdf908908 ABC2

178990809809809 ABC7

189890sf908908f ABC8

在不包含字符串的行之后打印x行

問題描述

4 個解決方案

解決方案1
0 2019-08-03 23:21:15

解決方案2
0 2019-08-04 01:34:10

解決方案3
-1 2019-08-03 23:22:46

解決方案4
-1 2019-08-03 23:38:33

在不包含字符串的行之后打印x行

問題描述

4 個解決方案

解決方案1 0 2019-08-03 23:21:15

解決方案2 0 2019-08-04 01:34:10

解決方案3 -1 2019-08-03 23:22:46

解決方案4 -1 2019-08-03 23:38:33

解決方案1
0 2019-08-03 23:21:15

解決方案2
0 2019-08-04 01:34:10

解決方案3
-1 2019-08-03 23:22:46

解決方案4
-1 2019-08-03 23:38:33