简体   繁体   English

遍历 txt 文件中的行时,如何在正则表达式触发后捕获多个后续行?

[英]When iterating through lines in txt file, how can I capture multiple subsequent lines after a regex triggers?

I have a txt file:我有一个txt文件:

This is the first line of block 1. It is always identifiable
Random
Stuff

This is the first line of block 2. It is always identifiable
Is
Always

This is the first line of block 3. It is always identifiable
In
Here!

I want to iterate through each line and look for the following code to trigger and capture a fixed amount of lines following:我想遍历每一行并查找以下代码来触发和捕获以下固定数量的行:

for line in lines:
    match = re.compile(r'(.*)block 2.(.*)'.search(line)
    if match:
        #capture current line and the following 2 lines

After parsing the txt file, I want to return:解析txt文件后,我想返回:

This is the first line of block 2
Is
Always

In my particular example, the first line of my block is always identifiable.在我的特定示例中,我的块的第一行始终是可识别的。 There is a consistent row count per block.每个块有一致的行数。 The contents of lines >= 2 will always change and cannot reliably be returned when using regex. >= 2 行的内容将始终更改,并且在使用正则表达式时无法可靠地返回。

You can call the next() function to get the next element in the iterator.您可以调用next()函数来获取迭代器中的下一个元素。

def get_block2(lines):
    for line in lines:
        match = re.compile(r'(.*)block 2\n').search(line)
        if match:
            line2 = next(lines)
            line3 = next(lines)
            return line, line2, line3

assuming lines is an iterator, so you can just grab them from it.假设lines是一个迭代器,所以你可以从中获取它们。

block2 = re.compile(r'(.*)block 2\n')

for l in lines:
    if block2.search(l):
        res = [l, next(lines), next(lines)]
        break

print(res)

if not lines isn't an iterator, you just have to add lines = iter(lines) to the code.如果 not lines不是迭代器,则只需将lines = iter(lines)到代码中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM