替换以相同模式开头的连续行块

Question

I'd like to match (and replace with a custom replacement function) each block of consecutive lines that all start by foo .我想匹配（并用自定义替换函数替换）每个以foo开头的连续行块。 This nearly works:这几乎有效：

import re

s = """bar6387
bar63287
foo1234
foohelloworld
fooloremipsum
baz
bar
foo236
foo5382
bar
foo879"""

def f(m):
    print(m)

s = re.sub('(foo.*\n)+', f, s)
print(s)
# <re.Match object; span=(17, 53), match='foo1234\nfoohelloworld\nfooloremipsum\n'>
# <re.Match object; span=(61, 76), match='foo236\nfoo5382\n'>

but it fails to recognize the last block, obviously because it is the last line and there is no \n at the end.但它无法识别最后一个块，显然是因为它是最后一行并且末尾没有\n 。

Is there a cleaner way to match a block of one or multiple consecutive lines starting with same pattern foo ?有没有一种更简洁的方法来匹配以相同模式foo开头的一个或多个连续行的块？

Answer 1

Here is an re.findall approach:这是一个re.findall方法：

s = """bar6387
bar63287
foo1234
foohelloworld
fooloremipsum
baz
bar
foo236
foo5382
bar
foo879"""

lines = re.findall(r'^foo.*(?:\nfoo.*(?=\n|$))*', s, flags=re.M)
print(lines)
# ['foo1234\nfoohelloworld\nfooloremipsum',
   'foo236\nfoo5382',
   'foo879']

The above regex runs in multiline mode, and says to match:上面的正则表达式在多行模式下运行，并表示匹配：

^                     from the start of a line
foo                   "foo"
.*                    consume the rest of the line
(?:\nfoo.*(?=\n|$))*  match newline and another "foo" line, 0 or more times

Edit:编辑：

If you need to replace/remove these blocks, then use the same pattern with re.sub and a lambda callback:如果您需要替换/删除这些块，则使用与re.sub和 lambda 回调相同的模式：

output = re.sub(r'^foo.*(?:\nfoo.*(?=\n|$))*', lambda m: "BLAH", s, flags=re.M)
print(output)

This prints:这打印：

bar6387
bar63287
BLAH
baz
bar
BLAH
bar
BLAH

Answer 2

Do you really need a regex?你真的需要正则表达式吗？ Here is a itertools.groupby based approach:这是一个基于itertools.groupby的方法：

from itertools import groupby
import re

# dummy example function
f = lambda x: '>>'+x.upper()+'<<'

out= '\n'.join(f(G) if (G:='\n'.join(g)) and k else G
               for k,g in groupby(s.split('\n'), lambda l: l.startswith('foo')))

print(out)

NB.注意。 you don't need a regex, but you can also use a regex if needed to define the matching lines in groupby您不需要正则表达式，但如果需要，您也可以使用正则表达式来定义groupby中的匹配行

# using a regex to match the blocks:
out= '\n'.join(f(G) if (G:='\n'.join(g)) and k else G
               for k,g in  groupby(s.split('\n'),
                                   lambda l: bool(re.match('foo', l))
                                   ))

ouput:输出：

bar6387
bar63287
>>FOO1234
FOOHELLOWORLD
FOOLOREMIPSUM<<
baz
bar
>>FOO236
FOO5382<<
barfoo
bar
>>FOO879<<

Answer 3

You can use您可以使用

re.sub(r'(?m)^foo.*(?:\nfoo.*)*', f, s)
re.sub(r'^foo.*(?:\nfoo.*)*', f, s, flags=re.M)

where在哪里

^ - matches start of string (here, a start of any line due to (?m) or re.M option) ^ - 匹配字符串的开头（这里是由于(?m)或re.M选项而导致的任何行的开头）
foo - matches foo foo - 匹配foo
.* - any zero or more chars other than line break chars as many as possible .* - 尽可能多的除换行符以外的任何零个或多个字符
(?:\nfoo.*)* - zero or more sequences of a newline, foo and then the rest of the line. (?:\nfoo.*)* - 零个或多个换行符、 foo和该行的 rest 序列。

See the Python demo :请参阅Python 演示：

import re

s = "bar6387\nbar63287\nfoo1234\nfoohelloworld\nfooloremipsum\nbaz\nbar\nfoo236\nfoo5382\nbar\nfoo879"
def f(m):
    print(m.group().replace('\n', r'\n'))

re.sub(r'(?m)^foo.*(?:\nfoo.*)*', f, s)

Output: Output：

foo1234\nfoohelloworld\nfooloremipsum
foo236\nfoo5382
foo879

替换以相同模式开头的连续行块

问题描述

3 个解决方案

解决方案1
3 2022-01-31 09:41:59

解决方案2
1 2022-01-31 09:47:37

解决方案3
0 2022-01-31 09:48:06

替换以相同模式开头的连续行块

问题描述

3 个解决方案

解决方案1 3 2022-01-31 09:41:59

解决方案2 1 2022-01-31 09:47:37

解决方案3 0 2022-01-31 09:48:06

解决方案1
3 2022-01-31 09:41:59

解决方案2
1 2022-01-31 09:47:37

解决方案3
0 2022-01-31 09:48:06