简体   繁体   中英

Pythonic way of splitting loop over list in two parts with one iterator

I am processing a text file with an irregular structure that consists of a header and of data in different sections. What I aim to do is walk through a list and jump to the next section once a certain character is encountered. I made a simple example below. What is the elegant way of dealing with this problem?

lines = ['a','b','c','$', 1, 2, 3]

for line in lines:
    if line == '$':
        print("FOUND END OF HEADER")
        break
    else:
        print("Reading letters")

# Here, I start again, but I would like to continue with the actual
# state of the iterator, in order to only read the remaining elements.
for line in lines:
    print("Reading numbers")

You actually can have one iterator for both loops by creating your line iterator outside the for loop with the builtin function iter . This way it will be partially exhausted in the first loop and reusable in the next loop.

lines = ['a','b','c','$', 1, 2, 3]

iter_lines = iter(lines) # This creates and iterator on lines

for line in iter_lines :
    if line == '$':
        print("FOUND END OF HEADER")
        break
    else:
        print("Reading letters")

for line in iter_lines:
    print("Reading numbers")

The above prints this result.

Reading letters
Reading letters
Reading letters
FOUND END OF HEADER
Reading numbers
Reading numbers
Reading numbers

You could use enumerate to keep track of where you are in the iteration:

lines = ['a','b','c','$', 1, 2, 3]

for i, line in enumerate(lines):
    if line == '$':
        print("FOUND END OF HEADER")
        break
    else:
        print("Reading letters")

print(lines[i+1:]) #prints [1,2,3]

But, unless you actually need to process the header portion, the idea of @EdChum to simply use index is probably better.

A simpler way and maybe more pythonic:

lines = ['a','b','c','$', 1, 2, 3]
print([i for i in lines[lines.index('$')+1:]])
# [1, 2, 3]

If you want to read each element after $ to different variables, try this:

lines = ['a','b','c','$', 1, 2, 3]
a, b, c = [i for i in lines[lines.index('$')+1:]]
print(a, b, c)
# 1 2 3

Or if you are unaware of how many elements follow $ , you could do something like this:

lines = ['a','b','c','$', 1, 2, 3, 4, 5, 6]
a, *b = [i for i in lines[lines.index('$')+1:]]
print(a, *b)
# 1 2 3 4 5 6

If you have more that one kind of separators, the most generic solution would be to built a mini-state machine to parse your data:

def state0(line):
  pass # processing function for state0

def state1(line):
  pass # processing function for state1

# and so on...

states = (state0, state1, ...)     # tuple grouping all processing functions
separators = {'$':1, '#':2, ...}   # linking separators and states
state = 0                          # initial state

for line in text:
  if line in separators:
    print('Found separator', line)
    state = separators[line]       # change state
  else:
    states[state](line)            # process line with associated function

This solution is able to correctly process arbitrary number of separators in arbitrary order with arbitrary number of repetitions. The only constraint is that a given separator is always followed by the same kind of data, that can be process by its associated function.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM