[英]Accessing x+1 element with 'for x in list' in Python
I'm trying to parse a new line delimited text file into blocks of lines, which are appended to a .txt file. 我正在尝试将新行分隔的文本文件解析为行块,这些行附加到.txt文件。 I'd like to be able to grab x amount of lines AFTER my ending string, as these lines will vary in content, meaning setting the 'end string' to try to match it would miss lines. 我希望能够在结束字符串之后抓取x行数,因为这些行的内容会有所不同,这意味着设置'结束字符串'以尝试匹配它会错过行。
Example of file: 文件示例:
"Start"
"..."
"..."
"..."
"..."
"---" ##End here
"xxx" ##Unique data here
"xxx" ##And here
And here's the code 这是代码
first = "Start"
first_end = "---"
with open('testlog.log') as infile, open('parsed.txt', 'a') as outfile:
copy = False
for line in infile:
if line.strip().startswith(first):
copy = True
outfile.write(line)
elif line.strip().startswith(first_end):
copy = False
outfile.write(line)
##Want to also write next 2 lines here
elif copy:
outfile.write(line)
Is there any way to do this using for line in infile
, or do I need to use a different type of loop? 是否有任何方法可以使用for line in infile
,或者我是否需要使用不同类型的循环?
You can use next
or readline
(in Python 3 and up) to retrieve the next line in the file: 您可以使用next
或readline
(在Python 3及更高版本中)检索文件中的下一行:
elif line.strip().startswith(first_end):
copy = False
outfile.write(line)
outfile.write(next(infile))
outfile.write(next(infile))
or 要么
#note: not compatible with Python 2.7 and below
elif line.strip().startswith(first_end):
copy = False
outfile.write(line)
outfile.write(infile.readline())
outfile.write(infile.readline())
This will also cause the file pointer to advance two additional lines, so the next iteration of for line in infile:
will skip past the two lines you read with readline
. 这也会导致文件指针前进两个额外的行,因此for line in infile:
中for line in infile:
的下一次迭代将跳过你用readline
读取的两行。
Bonus terminology nitpick: a file object is not a list, and methods for accessing the x+1th element of a list might not work for accessing the next line of a file, and vice versa. 奖励术语nitpick:文件对象不是列表,访问列表的第x + 1个元素的方法可能不适用于访问文件的下一行,反之亦然。 If you did want to access the next item of a proper list object, you could use enumerate
so you can perform arithmetic on the list's index. 如果您确实想要访问正确列表对象的下一项,则可以使用enumerate
以便可以对列表的索引执行算术运算。 For example: 例如:
seq = ["foo", "bar", "baz", "qux", "troz", "zort"]
#find all instances of "baz" and also the first two elements after "baz"
for idx, item in enumerate(seq):
if item == "baz":
print(item)
print(seq[idx+1])
print(seq[idx+2])
Note that, unlike readline
, indexing will not advance the iterator, so for idx, item in enumerate(seq):
will still iterate over "qux" and "troz". 请注意,与readline
不同,索引不会推进迭代器,因此for idx, item in enumerate(seq):
仍会迭代“qux”和“troz”。
An approach that works on any iterable is to use an additional variable to keep track of state across iterations. 适用于任何迭代的方法是使用附加变量来跟踪迭代中的状态。 The advantage of this is that you don't have to know anything about how to manually advance iterables; 这样做的好处是你不必知道如何手动推进迭代; the disadvantage is that reasoning about the logic within the loop is more difficult because it exposes an additional side-effect. 缺点是推理循环内的逻辑更加困难,因为它暴露了额外的副作用。
first = "Start"
first_end = "---"
with open('testlog.log') as infile, open('parsed.txt', 'a') as outfile:
copy = False
num_items_to_write = 0
for line in infile:
if num_items_to_write > 0:
outfile.write(line)
num_items_to_write -= 1
elif line.strip().startswith(first):
copy = True
outfile.write(line)
elif line.strip().startswith(first_end):
copy = False
outfile.write(line)
num_items_to_write = 2
elif copy:
outfile.write(line)
In the specific case of pulling repetitive groups of data out of a delimited file, it might be appropriate to skip iteration entirely and use regex instead. 在从分隔文件中提取重复数据组的特定情况下,完全跳过迭代并使用正则表达式可能是合适的。 For data like yours, that might look like: 对于像您这样的数据,可能看起来像:
import re
with open("testlog.log") as file:
data = file.read()
pattern = re.compile(r"""
^Start$ #"Start" by itself on a line
(?:\n.*$)*? #zero or more lines, matched non-greedily
#use (?:) for all groups so `findall` doesn't capture them later
\n---$ #"---" by itself on a line
(?:\n.*$){2} #exactly two lines
""", re.MULTILINE | re.VERBOSE)
#equivalent one-line regex:
#pattern = re.compile("^Start$(?:\n.*$)*?\n---$(?:\n.*$){2}", re.MULTILINE)
for group in pattern.findall(data):
print("Found group:")
print(group)
print("End of group.\n\n")
When run on a log that looks like: 在日志上运行时看起来像:
Start
foo
bar
baz
qux
---
troz
zort
alice
bob
carol
dave
Start
Fred
Barney
---
Wilma
Betty
Pebbles
... This will produce the output: ...这将产生输出:
Found group:
Start
foo
bar
baz
qux
---
troz
zort
End of group.
Found group:
Start
Fred
Barney
---
Wilma
Betty
End of group.
easiest would be to make a generator function parsing the infile: 最简单的方法是使生成器函数解析infile:
def read_file(file_handle, start_line, end_line, extra_lines=2):
start = False
while True:
try:
line = next(file_handle)
except StopIteration:
return
if not start and line.strip().startswith(start_line):
start = True
yield line
elif not start:
continue
elif line.strip().startswith(end_line):
yield line
try:
for _ in range(extra_lines):
yield next(file_handle)
except StopIteration:
return
else:
yield line
The try-except
clauses would not be needed if you know each file is well-formed. 如果您知道每个文件格式正确,则不需要try-except
子句。
You can use this generator like this: 您可以像这样使用此生成器:
if __name__ == "__main__":
first = "Start"
first_end = "---"
with open("testlog.log") as infile, open("parsed.txt", "a") as outfile:
output = read_file(
file_handle=infile,
start_line=first,
end_line=first_end,
extra_lines=1,
)
outfile.writelines(output)
A variation of @Kevin answer with a 3-state variable and less code duplication. @Kevin的变体回答了3状态变量和更少的代码重复。
first = "Start"
first_end = "---"
# Lines to read after end flag
extra_count = 2
with open('testlog.log') as infile, open('parsed.txt', 'a') as outfile:
# Do no copy by default
copy = 0
for line in infile:
# Strip once only
clean_line = line.strip()
# Enter "infinite copy" state
if clean_line.startswith(first):
copy = -1
# Copy next line and extra amount
elif clean_line.startswith(first_end):
copy = extra_count + 1
# If in a "must-copy" state
if copy != 0:
# One less line to copy if end flag passed
if copy > 0:
copy -= 1
# Copy current line
outfile.write(line)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.