retrieving the following line using python next() and strip()

Question

I'm having trouble using next() and strip() to retrieve the line following the one I'm reading. The test data looks something like this:

@abcde:111/2
ABCDEFGHIj
+abcde:111/2
bla11
@abcde:115/2
JDIJSKNDIJ
+abcde:115/2
bla13
@abcde:113/2
djijwkoken
+abcde:113/2
bla15

My goal is to delete all sets of 4 lines starting with '@' that contain 'N' in the second line. The expected test output should look like this:

@abcde:111/2
ABCDEFGHIj
+abcde:111/2
bla11
@abcde:113/2
djijwkoken
+abcde:113/2
bla15

Here is my code (delete_N.py), I'm running it using Mac OS Terminal on a remote Ubuntu server, with python 2.7:

import sys

filename1 = sys.argv[1] #file to process

data = open(filename1, 'r')

def del_N(input1):
    for line in input1:
        if line[:1] == '@' and 'N' not in next(input1).strip():
            print line.strip()
            for i in range(3):
                print next(input1).strip()

del_N(data)

But I get the following error:

Traceback (most recent call last):
  File "delete_N.py", line 14, in <module>
    del_N(data)
  File "delete_N.py", line 12, in del_N
    print next(input1).strip()
StopIteration

What am I doing wrong?

Answer 1

In your program, you are over reading data from the file. Check Lego's answer , where he explains the mistake very clearly.

You can do it like this. This program assumes that the number of lines in the file is a multiple of 4.

with open("Input.txt", "r") as input_file:
    for line1 in input_file:
        line2, line3, line4 = [next(input_file) for _ in xrange(3)]
        if "N" not in line2:
            print line1 + line2 + line3 + line4.rstrip()

Output

@abcde:111/2
ABCDEFGHIj
+abcde:111/2
bla11
@abcde:113/2
djijwkoken
+abcde:113/2
bla15

Answer 2

Python raises a StopIteration exception when you reach the end of an iterator. If you're calling next() on an iterator manually, rather than using a for ... in ... loop (which will terminate when StopIteration is raised), you must catch StopIteration and handle it, because it means that... well, the iterator has stopped.

Anyway, here is a (IMO) cleaner solution:

data = ... # your data goes here, from a file or whatever
lines = data.split('\n')
n = 4
groups = zip(*[lines[i::n] for i in range(n)])
# or, groups = zip(lines[0::4], lines[1::4], lines[2::4], lines[3::4])
result = []

for group in groups:
    if group[0].startswith('@') and 'N' in group[1]:
        continue # i.e. don't append
    else:
        result.append(group)

joined_result = '\n'.join(['\n'.join(group) for group in result])
print(joined_result)

Result:

@abcde:111/2
ABCDEFGHIj
+abcde:111/2
bla11
@abcde:113/2
djijwkoken
+abcde:113/2
bla15

Answer 3

The problem is that at the same time as you are iterating through the file with the for loop, next also iterates the cursor as it moves through the file. This means for each iteration you are actually jumping 3 spots at a time.

For example, look at this file:

                openning the file
@abcde:111/2    for line in input1: # First iteration.
ABCDEFGHIj          if line[:1] == '@' and 'N' not in next(input1).strip():
+abcde:111/2            print next(input1).strip()
bla11           for line in input1: # Second iteration.
@abcde:115/2       etc...

See how on each iteration up to 3 lines are jumped, so when the second last or last lines in the iteration are encountered, it will overflow and raise the StopIteration error.

retrieving the following line using python next() and strip()

Question

3 answers

solution1
3 ACCPTED 2014-01-15 03:11:09

solution2
2 2014-01-15 03:06:09

solution3
2

retrieving the following line using python next() and strip()

Question

3 answers

solution1 3 ACCPTED 2014-01-15 03:11:09

solution2 2 2014-01-15 03:06:09

solution3 2

solution1
3 ACCPTED 2014-01-15 03:11:09

solution2
2 2014-01-15 03:06:09

solution3
2