简体   繁体   中英

Print x lines after a line NOT containing a string

I am trying to condense a large file and I need to eliminate the lines not containing a certain pattern. However, I need also to save to a new file a certain limit of lines after the "not-pattern" line, and to continue to read every line of the file up to find a new "not-pattern" line.

For example, to recover the first 2 records after each "non-pattern line", the input file looks like this:

146587678080980

1789dsdss809809 ABC1

1898fdfdf908908 ABC2

1789798709fdb80 ABC3

798789789767567 ABC4

798787576567577

178990809809809 ABC7

189890sf908908f ABC8

178979ggggf9080 ABC9

18098rrttty0980 ABC10

1mkklnklnlknlkn ABC17

The output file should be:

1789dsdss809809 ABC1

1898fdfdf908908 ABC2

178990809809809 ABC7

189890sf908908f ABC8

I have tried this code up to now without success:

limit = 2

with open('input.txt') as oldfile, open('output.txt') as newfile: 
    for line in oldfile:
        if not ('ABC'):
            line_count = 0
            if line_count <= limit:
               newfile.write(line)
            line_count += 1

Here's a way that is similar to your example:

limit = 2
with open('input.txt') as ifh, open('output.txt', 'w') as ofh:
    ctr = 0
    for line in ifh:
        if not 'ABC' in line:
            ctr = 0
        else:
            if ctr < limit:
                ctr += 1
                ofh.write(line)

And here's an approach that is logically more explicit:

limit = 2
with open('input.txt') as ifh, open('output.txt', 'w') as ofh:
    it = iter(ifh)
    while True:
        try:
            if not 'ABC' in next(it):
                for _ in range(limit):
                    ofh.write(next(it))
        except StopIteration:
            break

You need to track 2 states:

  • one for finding the non-pattern line
  • one for capturing the lines (up to a certain limit) after the non-pattern line
limit = 2

with open('input.txt', "r") as oldfile, open('output.txt', "w") as newfile:
  is_capturing = False
  for line in oldfile:
    if not line.strip():
      # Ignore empty lines, do not consider them as a non-pattern
      continue
    elif not 'ABC' in line and not is_capturing:
      # State 1
      # Found the start of the non-pattern line ('ABC' not in line)
      # Enable state to capture next lines
      is_capturing = True
      line_count = 0
    elif is_capturing and line_count < limit:
      # State 2
      # Capture a certain limit of lines after the non-pattern line
      newfile.write(line)
      line_count += 1
    else:
      # Reset the state
      is_capturing = False

The output file should contain:

1789dsdss809809 ABC1
1898fdfdf908908 ABC2
178990809809809 ABC7
189890sf908908f ABC8

If you need to also save the "non-pattern" line, add it to State 1:

    elif not 'ABC' in line and not is_capturing:
      # State 1
      # Found the start of the non-pattern line ('ABC' not in line)
      # Enable state to capture next lines
      newfile.write(line)
      is_capturing = True
      line_count = 0

If you want to preserve the empty lines between each written line:

newfile.write(line + '\n')
limit = 2

with open('input.txt') as oldfile, open('output.txt', 'w') as newfile:
    line_count = 0
    for line in oldfile:
        if 'ABC' in line:
            newfile.write(line)
            line_count += 1
            if line_count == limit:
                break

Given the input file as this:

146587678080980

1789dsdss809809 ABC1

1898fdfdf908908 ABC2

1789798709fdb80 ABC3

798789789767567 ABC4

798787576567577

178990809809809 ABC7

189890sf908908f ABC8

178979ggggf9080 ABC9

18098rrttty0980 ABC10

1mkklnklnlknlkn ABC17

First open the file and strip the empty lines, saving the lines with content to a list of lines:

with open('input.txt', 'r') as f:
    in_lines = [line.strip('\n') for line in f.readlines() if len(line.strip('\n')) > 0]

Then you run through all the lines to find the "non-pattern line" ids and extend an empty output list of lines with the lines up to the limit after the current "non-pattern line" index.

out_lines = list()

LIMIT = 2
for idx, line in enumerate(in_lines):
    if 'ABC' not in line:
        out_lines.extend(in_lines[(idx + 1):(idx + 1 + LIMIT)])

To get the output file with the same format as the input:

with open('output.txt', 'w') as f:
    f.writelines('\n\n'.join(out_lines))

The result output.txt should be this:

1789dsdss809809 ABC1

1898fdfdf908908 ABC2

178990809809809 ABC7

189890sf908908f ABC8

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM