简体   繁体   English

解析文本文档的正则表达式

[英]Regular expressions parsing a text document

I was trying to parse a text documents with !if and !endif in between. 我正在尝试使用两者之间的!if和!endif解析文本文档。 I want to have the text without !if, !endif and text between them. 我希望文本之间没有!if,!endif以及它们之间的文本。

For example: 例如:

text
!if
text1
!endif
text2

I would like to have my output = text+text2+.. 我想要我的输出= text + text2 + ..

I tried something like this re.findall(r'((^(!if.*!endif))+', text). But it doesnt seem to work for me. 我尝试了类似re.findall(r'(((^(!if。*!endif))+',text)的方法,但它似乎对我不起作用。

Your regex would be: 您的正则表达式为:

^!if$.*?^!endif$\s+

This says: 这说:

^      - Match the beginning of a line (because of the re.M flag)
!if    - Match !
$      - Match the end of a line (because of the re.M flag)
.*?    - Match any number of characters (non-greedy) (includes line breaks, because of the re.S flag)
^      - Match the beginning of a line (because of the re.M flag)
!endif - Match !endif
$      - Match the end of a line (because of the re.M flag)
\s+    - Match one or more whitespace characters

So, you should be able to use it like this, which replaces all occurrences of the above regex with an empty string (nothing): 因此,您应该能够像这样使用它,它将上述正则表达式的所有匹配项替换为空字符串(无):

import re
s = "text\n!if\ntext1\n!endif\ntext2"
s = re.sub("^!if$.*?^!endif$\s+", "", s, flags=re.S | re.M)
print s

This will output : 将输出

text 
text2

Note that this explicitly requires !if and !endif be on separate lines. 请注意,这明确要求!if!endif位于单独的行上。 If this isn't a requirement, you can remove the $ and ^ anchors from the middle of the regex. 如果这不是必需的,则可以从正则表达式的中间删除$^锚。

^!if.*?!endif$\s+

I can help in sed: 我可以帮助sed:

sed '/^if$/,/^endif$/ d'

Here is the algorithm that sed uses: 这是sed使用的算法:

  1. set the variable match=False 设置变量match = False
  2. read next line 阅读下一行
  3. check if the line is equal 'if'. 检查行是否等于'if'。 If so, set the variable match=True 如果是这样,请设置变量match = True
  4. if match==True, check whether current-line=='endif'. 如果match == True,请检查current-line =='endif'。 If so, set match=False and delete the current line [and jumps to 0] . 如果是这样,设置match = False并删除当前行[并跳到0]。
  5. print the current line 打印当前行
  6. if not EOF , jumps to 1 如果不是EOF,则跳至1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM