Python正则表达式 - 替换除大括号之外的所有字符

Question

I'm A bit stuck with a regular expression. 我有点卡在正则表达式上。 I have a string in the format 我有格式的字符串

{% 'ello %} wor'ld {% te'st %}

and I want to escape only apostrophes that aren't between {% ... %} tags, so the expected output is 我想只转义不在{% ... %}标签之间的撇号，所以预期的输出是

{% 'ello %} wor&quot;ld {% te'st %}

I know I can replace all of them just using the string replace function, but I'm at a loss as to how to use regexs to just match those outside braces 我知道我可以使用字符串replace功能替换所有这些，但我不知道如何使用正则表达式匹配那些外部括号

Answer 1

This can probably be done with regex, but it would be a complicated one. 这可以用正则表达式来完成，但这将是一个复杂的。 It's easier to write and read if you just do it directly: 如果您直接执行此操作，则更容易编写和读取：

def escape(s):
    isIn = False
    ret = []
    for i in range(len(s)):
        if not isIn and s[i]=="'": ret += ["&quot;"]
        else: ret += s[i:i+1]

        if isIn and s[i:i+2]=="%}": isIn = False
        if not isIn and s[i:i+2]=="{%": isIn = True

    return "".join(ret)

Answer 2

Just for fun, this is the way to do it with regex: 只是为了好玩，这是用正则表达式做的方法：

>>> instr = "{% 'ello %} wor&quote;ld {% te'st %}"
>>> re.sub(r'\'(?=(.(?!%}))*({%|$))', r'&quote;', instr)
"{% 'ello %} wor&quote;ld {% te'st %}"

It uses a positive look ahead to find either {% or the end of the string, and a negative lookahead inside that positive lookahead to make sure it is not including any %} in the looking forward. 它使用正向前看来找到{％或字符串的结尾，以及在该正面预测中的负向前瞻以确保它在期待中不包括任何％}。

Answer 3

If you want to use regular expression, you could do it like this though: 如果你想使用正则表达式，你可以这样做：

>>> s = """'{% 'ello %} wor'ld {% te'st %}'"""
>>> segments = re.split( '(\{%.*?%\})', s )
>>> for i in range( 0, len( segments ), 2 ):
    segments[i] = segments[i].replace( '\'', '&quot;' )

>>> ''.join( segments )
"&quot;{% 'ello %} wor&quot;ld {% te'st %}&quot;"

Comparing with Ehsan's look-ahead solution, this has the benefit that you can run any kind of replacements or analysis on the segments without having to re-run another regular expression. 与Ehsan的前瞻解决方案相比，这样做的好处是，您可以在段上运行任何类型的替换或分析，而无需重新运行另一个正则表达式。 So if you decide to replace another character, you can easily do that in the loop. 因此，如果您决定替换另一个角色，您可以轻松地在循环中执行此操作。

Answer 4

bcloughlan, resurrecting this question because it had a simple solution that wasn't mentioned. bcloughlan，复活这个问题，因为它有一个没有提到的简单解决方案。 (Found your question while doing some research for a general question about how to exclude patterns in regex .) （在对有关如何排除正则表达式中的模式的一般问题进行一些研究时找到了您的问题。）

Here's a simple regex: 这是一个简单的正则表达式：

{%.*?%}|(\')

The left side of the alternation matches complete {% ... %} tags. 交替的左侧匹配完整的{% ... %}标记。 We will ignore these matches. 我们将忽略这些匹配。 The right side matches and captures apostrophes to Group 1, and we know they are the right apostrophes because they were not matched by the expression on the left. 右侧匹配并捕获第1组的撇号，我们知道它们是右撇号，因为它们与左侧的表达式不匹配。

This program shows how to use the regex (see the results in the online demo ): 该程序显示了如何使用正则表达式（请参阅在线演示中的结果）：

import re
subject = "{% 'ello %} wor'ld {% te'st %}"
regex = re.compile(r'{%.*?%}|(\')')
def myreplacement(m):
    if m.group(1):
        return "&quot;"
    else:
        return m.group(0)
replaced = regex.sub(myreplacement, subject)
print(replaced)

Reference 参考

Python正则表达式 - 替换除大括号之外的所有字符

问题描述

4 个解决方案

解决方案1
5 已采纳 2011-11-06 22:19:03

解决方案2
3 2011-11-06 22:32:08

解决方案3
2 2011-11-06 22:36:54

解决方案4
0 2014-05-20 22:31:58

Python正则表达式 - 替换除大括号之外的所有字符

问题描述

4 个解决方案

解决方案1 5 已采纳 2011-11-06 22:19:03

解决方案2 3 2011-11-06 22:32:08

解决方案3 2 2011-11-06 22:36:54

解决方案4 0 2014-05-20 22:31:58

解决方案1
5 已采纳 2011-11-06 22:19:03

解决方案2
3 2011-11-06 22:32:08

解决方案3
2 2011-11-06 22:36:54

解决方案4
0 2014-05-20 22:31:58