[英]Regular expressions parsing a text document
I was trying to parse a text documents with !if and !endif in between. 我正在尝试使用两者之间的!if和!endif解析文本文档。 I want to have the text without !if, !endif and text between them.
我希望文本之间没有!if,!endif以及它们之间的文本。
For example: 例如:
text
!if
text1
!endif
text2
I would like to have my output = text+text2+.. 我想要我的输出= text + text2 + ..
I tried something like this re.findall(r'((^(!if.*!endif))+', text). But it doesnt seem to work for me. 我尝试了类似re.findall(r'(((^(!if。*!endif))+',text)的方法,但它似乎对我不起作用。
Your regex would be: 您的正则表达式为:
^!if$.*?^!endif$\s+
This says: 这说:
^ - Match the beginning of a line (because of the re.M flag)
!if - Match !
$ - Match the end of a line (because of the re.M flag)
.*? - Match any number of characters (non-greedy) (includes line breaks, because of the re.S flag)
^ - Match the beginning of a line (because of the re.M flag)
!endif - Match !endif
$ - Match the end of a line (because of the re.M flag)
\s+ - Match one or more whitespace characters
So, you should be able to use it like this, which replaces all occurrences of the above regex with an empty string (nothing): 因此,您应该能够像这样使用它,它将上述正则表达式的所有匹配项替换为空字符串(无):
import re
s = "text\n!if\ntext1\n!endif\ntext2"
s = re.sub("^!if$.*?^!endif$\s+", "", s, flags=re.S | re.M)
print s
This will output : 这将输出 :
text
text2
Note that this explicitly requires !if
and !endif
be on separate lines. 请注意,这明确要求
!if
和!endif
位于单独的行上。 If this isn't a requirement, you can remove the $
and ^
anchors from the middle of the regex. 如果这不是必需的,则可以从正则表达式的中间删除
$
和^
锚。
^!if.*?!endif$\s+
I can help in sed: 我可以帮助sed:
sed '/^if$/,/^endif$/ d'
Here is the algorithm that sed uses: 这是sed使用的算法:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.