I'm having trouble using next()
and strip()
to retrieve the line following the one I'm reading. The test data looks something like this:
@abcde:111/2
ABCDEFGHIj
+abcde:111/2
bla11
@abcde:115/2
JDIJSKNDIJ
+abcde:115/2
bla13
@abcde:113/2
djijwkoken
+abcde:113/2
bla15
My goal is to delete all sets of 4 lines starting with '@' that contain 'N' in the second line. The expected test output should look like this:
@abcde:111/2
ABCDEFGHIj
+abcde:111/2
bla11
@abcde:113/2
djijwkoken
+abcde:113/2
bla15
Here is my code (delete_N.py), I'm running it using Mac OS Terminal on a remote Ubuntu server, with python 2.7:
import sys
filename1 = sys.argv[1] #file to process
data = open(filename1, 'r')
def del_N(input1):
for line in input1:
if line[:1] == '@' and 'N' not in next(input1).strip():
print line.strip()
for i in range(3):
print next(input1).strip()
del_N(data)
But I get the following error:
Traceback (most recent call last):
File "delete_N.py", line 14, in <module>
del_N(data)
File "delete_N.py", line 12, in del_N
print next(input1).strip()
StopIteration
What am I doing wrong?
In your program, you are over reading data from the file. Check Lego's answer , where he explains the mistake very clearly.
You can do it like this. This program assumes that the number of lines in the file is a multiple of 4.
with open("Input.txt", "r") as input_file:
for line1 in input_file:
line2, line3, line4 = [next(input_file) for _ in xrange(3)]
if "N" not in line2:
print line1 + line2 + line3 + line4.rstrip()
Output
@abcde:111/2
ABCDEFGHIj
+abcde:111/2
bla11
@abcde:113/2
djijwkoken
+abcde:113/2
bla15
Python raises a StopIteration
exception when you reach the end of an iterator. If you're calling next()
on an iterator manually, rather than using a for ... in ...
loop (which will terminate when StopIteration
is raised), you must catch StopIteration
and handle it, because it means that... well, the iterator has stopped.
Anyway, here is a (IMO) cleaner solution:
data = ... # your data goes here, from a file or whatever
lines = data.split('\n')
n = 4
groups = zip(*[lines[i::n] for i in range(n)])
# or, groups = zip(lines[0::4], lines[1::4], lines[2::4], lines[3::4])
result = []
for group in groups:
if group[0].startswith('@') and 'N' in group[1]:
continue # i.e. don't append
else:
result.append(group)
joined_result = '\n'.join(['\n'.join(group) for group in result])
print(joined_result)
Result:
@abcde:111/2
ABCDEFGHIj
+abcde:111/2
bla11
@abcde:113/2
djijwkoken
+abcde:113/2
bla15
The problem is that at the same time as you are iterating through the file with the for
loop, next
also iterates the cursor as it moves through the file. This means for each iteration you are actually jumping 3 spots at a time.
For example, look at this file:
openning the file
@abcde:111/2 for line in input1: # First iteration.
ABCDEFGHIj if line[:1] == '@' and 'N' not in next(input1).strip():
+abcde:111/2 print next(input1).strip()
bla11 for line in input1: # Second iteration.
@abcde:115/2 etc...
See how on each iteration up to 3 lines are jumped, so when the second last or last lines in the iteration are encountered, it will overflow and raise the StopIteration
error.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.