在 python 中使用正则表达式替换特定模式

Question

我正在尝试通过在 python 中使用正则表达式来整理特定段落。

这是一个 input.txt 文件。

some random texts (100+ lines)
bbb
...
ttt
some random texts
ccc
...
fff    
paragraph_a A_story(

...
some random texts adfsasdsd

...
)

paragraph_b different_story(
...
some random texts
...
)

预计 output 在这里：

some random texts (100+ lines)
bbb
...
ttt
some random texts
ccc
...
fff    

paragraph_b different_story(
...
some random texts
...
)

我想要做的是删除所有paragraph_a内容（包括括号），但它应该由下一行段落的名称（在本例中为paragraph_b）删除，因为要删除的段落的内容（在在这种情况下，paragraph_a) 是随机的。

我已经设法对 select 进行正则表达式只有位于paragraph_b正上方的段落

https://regex101.com/r/pwGVbe/1 <- 你可以在这里参考。

但是，通过使用这个正则表达式，我无法删除我想要的东西。

这是我到目前为止所做的：

import re

output = open ('output.txt', 'w')
input = open('input.txt', 'r')

for line in input:
#    print(line)
    t = re.sub('^(\w+ \w+\((?:(.|\n)*)\))\s*^paragraph_b','', line)
    output.write(t)

有什么我可以得到一些解决方案或线索的吗？ 任何答案或建议将不胜感激。

谢谢。

Answer 1

您可以通过断言paragraph_b 来匹配之前的段落，而不是跨越更多的段落。

注意input是一个保留关键字，所以不要写input = open('input.txt', 'r')你可以这样写input_file = open('file', 'r')

 ^\w+ \w+\((?:\n(?!^\w+ \w+\().*)*\)(?=\s*^paragraph_b)

正则表达式演示

如果匹配也不应该以 paragraph_b 本身开头：

^(?!paragraph_b)\w+ \w+\((?:\n(?!^\w+ \w+\().*)*\)(?=\s*^paragraph_b)

正则表达式演示

例如，使用input_file.read()读取整个文件：

import re

output_file = open('file_out', 'w')
input_file = open('file', 'r')

t = re.sub(
    '^(?!paragraph_b)\w+ \w+\((?:\n(?!^\w+ \w+\().*)*\)(?=\s*^paragraph_b)',
    '',
    input_file.read(),
    0,
    re.M
)
output_file.write(t)

output.txt的内容

some random texts (100+ lines)
bbb
...
ttt
some random texts
ccc
...
fff    


paragraph_b different_story(
...
some random texts
...
)

Answer 2

您的代码不起作用，因为您正在逐行解析文本：

for line in input:

这样你的正则表达式就没有机会匹配整个文件内容。 您最好一次阅读所有内容并将其存储在单个字符串变量中，然后使用该字符串变量使用正则表达式应用您的修改。

在 python 中使用正则表达式替换特定模式

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-08-21 13:58:41

解决方案2
0 2022-08-21 14:05:42

在 python 中使用正则表达式替换特定模式

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-08-21 13:58:41

解决方案2 0 2022-08-21 14:05:42

解决方案1
1 已采纳 2022-08-21 13:58:41

解决方案2
0 2022-08-21 14:05:42