解析文本文档的正则表达式

Question

I was trying to parse a text documents with !if and !endif in between. 我正在尝试使用两者之间的！if和！endif解析文本文档。 I want to have the text without !if, !endif and text between them. 我希望文本之间没有！if，！endif以及它们之间的文本。

For example: 例如：

text
!if
text1
!endif
text2

I would like to have my output = text+text2+.. 我想要我的输出= text + text2 + ..

I tried something like this re.findall(r'((^(!if.*!endif))+', text). But it doesnt seem to work for me. 我尝试了类似re.findall（r'（（（^（！if。*！endif））+'，text）的方法，但它似乎对我不起作用。

Answer 1

Your regex would be: 您的正则表达式为：

^!if$.*?^!endif$\s+

This says: 这说：

^      - Match the beginning of a line (because of the re.M flag)
!if    - Match !
$      - Match the end of a line (because of the re.M flag)
.*?    - Match any number of characters (non-greedy) (includes line breaks, because of the re.S flag)
^      - Match the beginning of a line (because of the re.M flag)
!endif - Match !endif
$      - Match the end of a line (because of the re.M flag)
\s+    - Match one or more whitespace characters

So, you should be able to use it like this, which replaces all occurrences of the above regex with an empty string (nothing): 因此，您应该能够像这样使用它，它将上述正则表达式的所有匹配项替换为空字符串（无）：

import re
s = "text\n!if\ntext1\n!endif\ntext2"
s = re.sub("^!if$.*?^!endif$\s+", "", s, flags=re.S | re.M)
print s

This will output : 这将输出：

text 
text2

Note that this explicitly requires !if and !endif be on separate lines. 请注意，这明确要求!if和!endif位于单独的行上。 If this isn't a requirement, you can remove the $ and ^ anchors from the middle of the regex. 如果这不是必需的，则可以从正则表达式的中间删除$和^锚。

^!if.*?!endif$\s+

Answer 2

I can help in sed: 我可以帮助sed：

sed '/^if$/,/^endif$/ d'

Here is the algorithm that sed uses: 这是sed使用的算法：

set the variable match=False 设置变量match = False
read next line 阅读下一行
check if the line is equal 'if'. 检查行是否等于'if'。 If so, set the variable match=True 如果是这样，请设置变量match = True
if match==True, check whether current-line=='endif'. 如果match == True，请检查current-line =='endif'。 If so, set match=False and delete the current line [and jumps to 0] . 如果是这样，设置match = False并删除当前行[并跳到0]。
print the current line 打印当前行
if not EOF , jumps to 1 如果不是EOF，则跳至1

解析文本文档的正则表达式

问题描述

2 个解决方案

解决方案1
4 2012-07-27 23:22:49

解决方案2
0 2012-07-28 23:25:58

解析文本文档的正则表达式

问题描述

2 个解决方案

解决方案1 4 2012-07-27 23:22:49

解决方案2 0 2012-07-28 23:25:58

解决方案1
4 2012-07-27 23:22:49

解决方案2
0 2012-07-28 23:25:58