简体   繁体   English

多行正则表达式替换

[英]Multiline regex replace

I want to transform a text like: 我想转换一个文本,如:

$$
foo
bar
$$

to

<% tex
foo
bar
%>

and $\\alpha$ to <% tex \\alpha %> . $\\alpha$<% tex \\alpha %>

For the single line replace, I did this: 对于单行替换,我这样做:

re.sub(r"\$(.*)\$", r"<% tex \1 %>", text)

...and it works fine. ......它工作正常。

Now, I added the multiline flag to catch the multiline one: 现在,我添加了多行标志来捕获多行标志:

re.sub(r"(?i)\$\$(.*)\$\$", r"<% tex \1 %>", text)

...but it returns: ...但它返回:

<% tex  %>
foo
bar
<% tex  %>

Why? 为什么? I'm sure it's something trivial, but I can't imagine what. 我确定这是微不足道的,但我无法想象。

I'd suggest using the re.M (multiline) flag, and gobbling up everything not a dollar sign in your capture. 我建议使用re.M(多线)标志,并在你的捕获中吞噬所有不是美元符号的东西。

>>> import re
>>> t = """$$
foo
bar
$$"""
>>> re.sub(r"\$\$([^\$]+)\$\$", r"<% tex \1 %>", t, re.M)
'<% tex \nfoo\nbar\n %>'

With python 2.7.12 I have verified that this will work: 使用python 2.7.12我已经验证这将工作:

>>> import re
>>> t = """$$
... foo
... bar
... $$"""
>>> re.sub(r"\$\$(.*?)\$\$", r"<% tex \1 %>", t, flags=re.DOTALL)
'<% tex \nfoo\nbar\n %>'

As for the DOTALL flag, according to the official document: 至于DOTALL标志,根据官方文件:

re.S re.S

re.DOTALL re.DOTALL

Make the '.' 制作'。' special character matches any character at all, including a newline; 特殊字符可以匹配任何字符,包括换行符; without this flag, '.' 没有这个标志,'。' will match anything except a newline. 将匹配除换行符之外的任何内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM