简体   繁体   English

在不包含字符串的行之后打印x行

[英]Print x lines after a line NOT containing a string

I am trying to condense a large file and I need to eliminate the lines not containing a certain pattern. 我正在尝试压缩一个大文件,并且需要消除不包含特定模式的行。 However, I need also to save to a new file a certain limit of lines after the "not-pattern" line, and to continue to read every line of the file up to find a new "not-pattern" line. 但是,我还需要将“非模式”行之后的行数限制保存到新文件中,并继续读取文件的每一行以找到新的“非模式”行。

For example, to recover the first 2 records after each "non-pattern line", the input file looks like this: 例如,要恢复每个“非模式行”之后的前2条记录,输入文件如下所示:

146587678080980

1789dsdss809809 ABC1

1898fdfdf908908 ABC2

1789798709fdb80 ABC3

798789789767567 ABC4

798787576567577

178990809809809 ABC7

189890sf908908f ABC8

178979ggggf9080 ABC9

18098rrttty0980 ABC10

1mkklnklnlknlkn ABC17

The output file should be: 输出文件应为:

1789dsdss809809 ABC1

1898fdfdf908908 ABC2

178990809809809 ABC7

189890sf908908f ABC8

I have tried this code up to now without success: 到目前为止,我已经尝试了以下代码:

limit = 2

with open('input.txt') as oldfile, open('output.txt') as newfile: 
    for line in oldfile:
        if not ('ABC'):
            line_count = 0
            if line_count <= limit:
               newfile.write(line)
            line_count += 1

Here's a way that is similar to your example: 这是一种类似于您的示例的方法:

limit = 2
with open('input.txt') as ifh, open('output.txt', 'w') as ofh:
    ctr = 0
    for line in ifh:
        if not 'ABC' in line:
            ctr = 0
        else:
            if ctr < limit:
                ctr += 1
                ofh.write(line)

And here's an approach that is logically more explicit: 这是一种在逻辑上更加明确的方法:

limit = 2
with open('input.txt') as ifh, open('output.txt', 'w') as ofh:
    it = iter(ifh)
    while True:
        try:
            if not 'ABC' in next(it):
                for _ in range(limit):
                    ofh.write(next(it))
        except StopIteration:
            break

You need to track 2 states: 您需要跟踪2个状态:

  • one for finding the non-pattern line 查找非图案线的一种
  • one for capturing the lines (up to a certain limit) after the non-pattern line 一个用于捕获非模式行之后的行(达到特定限制)
limit = 2

with open('input.txt', "r") as oldfile, open('output.txt', "w") as newfile:
  is_capturing = False
  for line in oldfile:
    if not line.strip():
      # Ignore empty lines, do not consider them as a non-pattern
      continue
    elif not 'ABC' in line and not is_capturing:
      # State 1
      # Found the start of the non-pattern line ('ABC' not in line)
      # Enable state to capture next lines
      is_capturing = True
      line_count = 0
    elif is_capturing and line_count < limit:
      # State 2
      # Capture a certain limit of lines after the non-pattern line
      newfile.write(line)
      line_count += 1
    else:
      # Reset the state
      is_capturing = False

The output file should contain: 输出文件应包含:

1789dsdss809809 ABC1
1898fdfdf908908 ABC2
178990809809809 ABC7
189890sf908908f ABC8

If you need to also save the "non-pattern" line, add it to State 1: 如果您还需要保存“非模式”行,请将其添加到状态1:

    elif not 'ABC' in line and not is_capturing:
      # State 1
      # Found the start of the non-pattern line ('ABC' not in line)
      # Enable state to capture next lines
      newfile.write(line)
      is_capturing = True
      line_count = 0

If you want to preserve the empty lines between each written line: 如果要保留每行之间的空行:

newfile.write(line + '\n')
limit = 2

with open('input.txt') as oldfile, open('output.txt', 'w') as newfile:
    line_count = 0
    for line in oldfile:
        if 'ABC' in line:
            newfile.write(line)
            line_count += 1
            if line_count == limit:
                break

Given the input file as this: 给定输入文件如下:

146587678080980

1789dsdss809809 ABC1

1898fdfdf908908 ABC2

1789798709fdb80 ABC3

798789789767567 ABC4

798787576567577

178990809809809 ABC7

189890sf908908f ABC8

178979ggggf9080 ABC9

18098rrttty0980 ABC10

1mkklnklnlknlkn ABC17

First open the file and strip the empty lines, saving the lines with content to a list of lines: 首先打开文件并删除空行,将包含内容的行保存到行列表中:

with open('input.txt', 'r') as f:
    in_lines = [line.strip('\n') for line in f.readlines() if len(line.strip('\n')) > 0]

Then you run through all the lines to find the "non-pattern line" ids and extend an empty output list of lines with the lines up to the limit after the current "non-pattern line" index. 然后,您遍历所有行以查找“非图案行” id,并扩展空的行输出列表,其中行数达到当前“非图案行”索引之后的限制。

out_lines = list()

LIMIT = 2
for idx, line in enumerate(in_lines):
    if 'ABC' not in line:
        out_lines.extend(in_lines[(idx + 1):(idx + 1 + LIMIT)])

To get the output file with the same format as the input: 要获得与输入格式相同的输出文件:

with open('output.txt', 'w') as f:
    f.writelines('\n\n'.join(out_lines))

The result output.txt should be this: 结果output.txt应该是这样的:

1789dsdss809809 ABC1

1898fdfdf908908 ABC2

178990809809809 ABC7

189890sf908908f ABC8

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM