简体   繁体   English

用一个迭代器将列表分成两部分的Python方式

[英]Pythonic way of splitting loop over list in two parts with one iterator

I am processing a text file with an irregular structure that consists of a header and of data in different sections. 我正在处理具有不规则结构的文本文件,该结构由标题和不同部分中的数据组成。 What I aim to do is walk through a list and jump to the next section once a certain character is encountered. 我的目的是浏览列表,遇到特定角色后跳至下一部分。 I made a simple example below. 我在下面做了一个简单的例子。 What is the elegant way of dealing with this problem? 解决这个问题的优雅方法是什么?

lines = ['a','b','c','$', 1, 2, 3]

for line in lines:
    if line == '$':
        print("FOUND END OF HEADER")
        break
    else:
        print("Reading letters")

# Here, I start again, but I would like to continue with the actual
# state of the iterator, in order to only read the remaining elements.
for line in lines:
    print("Reading numbers")

You actually can have one iterator for both loops by creating your line iterator outside the for loop with the builtin function iter . 通过使用内置函数iter在for循环外创建行迭代器,您实际上可以为两个循环使用一个迭代器。 This way it will be partially exhausted in the first loop and reusable in the next loop. 这样,它将在第一个循环中部分耗尽,并在下一个循环中可重复使用。

lines = ['a','b','c','$', 1, 2, 3]

iter_lines = iter(lines) # This creates and iterator on lines

for line in iter_lines :
    if line == '$':
        print("FOUND END OF HEADER")
        break
    else:
        print("Reading letters")

for line in iter_lines:
    print("Reading numbers")

The above prints this result. 上面打印了这个结果。

Reading letters
Reading letters
Reading letters
FOUND END OF HEADER
Reading numbers
Reading numbers
Reading numbers

You could use enumerate to keep track of where you are in the iteration: 您可以使用enumerate来跟踪迭代中的位置:

lines = ['a','b','c','$', 1, 2, 3]

for i, line in enumerate(lines):
    if line == '$':
        print("FOUND END OF HEADER")
        break
    else:
        print("Reading letters")

print(lines[i+1:]) #prints [1,2,3]

But, unless you actually need to process the header portion, the idea of @EdChum to simply use index is probably better. 但是,除非您实际需要处理标题部分,否则@EdChum仅使用index的想法可能会更好。

A simpler way and maybe more pythonic: 一个更简单的方法,也许是更多的pythonic:

lines = ['a','b','c','$', 1, 2, 3]
print([i for i in lines[lines.index('$')+1:]])
# [1, 2, 3]

If you want to read each element after $ to different variables, try this: 如果要在$之后将每个元素读取到不同的变量,请尝试以下操作:

lines = ['a','b','c','$', 1, 2, 3]
a, b, c = [i for i in lines[lines.index('$')+1:]]
print(a, b, c)
# 1 2 3

Or if you are unaware of how many elements follow $ , you could do something like this: 或者,如果您不知道$后面有多少个元素,则可以执行以下操作:

lines = ['a','b','c','$', 1, 2, 3, 4, 5, 6]
a, *b = [i for i in lines[lines.index('$')+1:]]
print(a, *b)
# 1 2 3 4 5 6

If you have more that one kind of separators, the most generic solution would be to built a mini-state machine to parse your data: 如果分隔符不止一种,那么最通用的解决方案是构建一个微型状态机来解析您的数据:

def state0(line):
  pass # processing function for state0

def state1(line):
  pass # processing function for state1

# and so on...

states = (state0, state1, ...)     # tuple grouping all processing functions
separators = {'$':1, '#':2, ...}   # linking separators and states
state = 0                          # initial state

for line in text:
  if line in separators:
    print('Found separator', line)
    state = separators[line]       # change state
  else:
    states[state](line)            # process line with associated function

This solution is able to correctly process arbitrary number of separators in arbitrary order with arbitrary number of repetitions. 该解决方案能够以任意数量的重复以任意顺序正确地处理任意数量的分隔符。 The only constraint is that a given separator is always followed by the same kind of data, that can be process by its associated function. 唯一的约束是给定的分隔符始终后面跟随相同类型的数据,可以通过其关联的功能对其进行处理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM